SCAN-O-MATIC PHENOMICS

We screened a CRISPR interference library consisting of >9000 Saccharomyces cerevisiae strains where >98% of all essential and respiratory growth-essential genes were targeted with multiple gRNAs. The screen was performed using the high-throughput, high-resolution scan-o-matic platform (Zackrisson et al., 2016) link, where each strain is analyzed separately in order to generate and analyze high-resolution growth curves without the influence/competition from other strains.

ACETIC ACID TITRATION

In an ideal library screening (with strains coming from the same background) we should be able to observe a normally distributed wide phenotypic variability under a particular test condition to pick the best and the worst performers in the library. For this purpose, we need to identify a stressor concentration (in this case acetic acid), which should be severe enough to induce large phenotypic variability but at the same time most strains should manage to grow and give us a quantitative phenotype. Therefore, Plate 7 & 8 was pre-screened at different acetic acid concentrations (0, 50mM, 75mM, 100mM, and 150mM of acetic acid) to identify appropriate acetic acid concentration for the whole library screening. Unfortunately, the spatial control strain at this point was BY4741, which did not growth at 150mM (BY4741 was later replaced with CC23 i.e. one of the CRISPRi control strain with a gRNA non-homologous to Saccharomyces cerevisiae genome). Therefore, to compare our results we used the absolute generation time (without any normalization for spatial bias). We assumed that the phenotypic variability due to spatial bias will be very similar within the test plates and since we will only look at the phenotypic variability within the strains at this point, it should not severely influence the final conclusion of this titration round. The raw absolute data of plate 7 and 8 is available in the SOM_AA_TITRA folder within the RAW_DATA folder. The files are organized in the following specific format

phenotypes.Absolute.Atc7.5_aa[acetic acid concentration in mM]_p[plate number].csv

For example, the result of plate 7 in 50mM of acetic acid is available in the file phenotypes.Absolute.Atc7.5_aa50_p7.csv

For the ease of analysis we compiled the data in a .csv and the compiled data is available in the COMPILED_DATA folder.

Acetic acid titration data : 20210120_AA_titration_absolute_compiled.csv

  • Import the data
AA_titration_data <- read.csv("COMPILED_DATA/20210120_AA_titration_absolute_compiled.csv", na.strings = "NoGrowth")
  • Install packages: Out of these ggplot2 and reshape will be frequently used later for data visualization

  • ggplot2

  • reshape)

  • ggridges

  • Prepare the data in the format requisite for ggplot2 package using reshape

AA_titration_data_reshape <- reshape(data=AA_titration_data, idvar="gRNA_name",
                                     varying = colnames(AA_titration_data)[3:7],
                                     v.name=c("Generation_time"),
                                     new.row.names = 1:30000,
                                     direction="long",
                                     timevar = "Condition",
                                     times = colnames(AA_titration_data)[3:7])
  • Plot the Ridgeline plots: A nice way to compare the density trace of multiple dataset
Figure 1: Density trace of absolute generation time of strains in plate 7 and 8 at different concentration of acetic acid

Figure 1: Density trace of absolute generation time of strains in plate 7 and 8 at different concentration of acetic acid

CONCLUSION OF ACETIC ACID TITRATION

At 150mM we observed the largest phenotypic variability within the strains of plate 7 and 8. Therefore, 150mM was the selected acetic acid concentration to screen the entire library.

IMPORT SCAN-O-MATIC RAW DATA

The phenotypic data generated in scan-o-matic screening in .csv format. We extract both the absolute and the normalized phenotypes.

The CRISPRi strains in the library were arrayed in 24 plates in 384 format. Each CRISPRi plate was subjected to two different condition (Basal and 150 mM of Acetic acid). Therefore, for each plate four different files are generated. All files generated in a single independent experimental round are stored in a single folder.

  • SOM_SCR_R001 : Raw data for round1

  • SOM_SCR_R002 : Raw data for round2

ABSOLUTE DATA

The Absolute dataset gives the extracted phenotypes without any spatial normalization

NORMALIZED DATA

The Normalized dataset is generated after removal of any spatial bias. This is in log2 scale and referred as Log Strain Coefficient (LSC) values

FILE NAMING

Each file is named with the plate identifier in such a way so that it can be easily called programmatically

Eg. Plate 1 absolute data in basal (Ctrl) condition have the following string
Ctrl1.phenotypes.Absolute
AND
Plate 1 Normalized data in acetic acid (aa) stress have the string
aa1.phenotypes.Normalized

PURPOSE 1

At the end of this data import session, a single data.frame will be generated with the data of 24 plates. The whole dataset will be labeled with the strains attributes using the metadata key file (provided in the COMPILED_DATA folder). The data import below is shown for only Round2 dataset. Round1 can be generated modifying the folder location

METADATA KEY FILE : library_keyfile1536.csv

IMPORTING THE METADATA FILE

Metadata_CRISPRi <- read.csv("COMPILED_DATA/library_keyfile1536.csv", na.strings = "#N/A", stringsAsFactors = FALSE)
str(Metadata_CRISPRi)
## 'data.frame':    36864 obs. of  11 variables:
##  $ SL_No             : num  1 2 3 4 5 6 7 8 9 10 ...
##  $ gRNA_name         : chr  "FBP26-TRg-1" "FBP26-TRg-1" "HMI1-NRg-1" "HMI1-NRg-1" ...
##  $ Seq               : chr  "GCTTATCATACATTTACATC" "GCTTATCATACATTTACATC" "AAAAATTCTGACACATCACA" "AAAAATTCTGACACATCACA" ...
##  $ SOURCEPLATEID     : chr  "R2877.H.001" "R2877.H.001" "R2877.H.001" "R2877.H.001" ...
##  $ SOURCEDENSITY     : chr  "384A" "384A" "384A" "384A" ...
##  $ SOURCECOLONYCOLUMN: int  1 1 2 2 3 3 4 4 5 5 ...
##  $ SOURCECOLONYROW   : chr  "A" "A" "A" "A" ...
##  $ border            : logi  TRUE TRUE TRUE TRUE TRUE TRUE ...
##  $ GENE              : chr  "FBP26" "FBP26" "HMI1" "HMI1" ...
##  $ Control.gRNA      : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Location_1536     : chr  "A1" "A2" "A3" "A4" ...

GENERATE BASAL ABSOLUTE DATASET

m <- vector(mode = "character", length = 0)
file.names<-vector(mode = "character", length = 0)
temp_df<-data.frame()
data_Ctrl_Abs <- data.frame()
for(i in 1:24){
  m <- paste0("Ctrl", i, ".phenotypes.Absolute") 
  file.names[i] <- dir("RAW_DATA/SOM_SCR_R002/", pattern = m, full.names = TRUE)
  temp_df <- read.csv(file.names[i], na.strings = "NoGrowth")
  data_Ctrl_Abs <- rbind(data_Ctrl_Abs, temp_df)
}
str(data_Ctrl_Abs)
## 'data.frame':    36864 obs. of  18 variables:
##  $ Plate                                   : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Row                                     : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Column                                  : int  0 1 2 3 4 5 6 7 8 9 ...
##  $ Phenotypes.InitialValue                 : num  80116 83495 92467 97104 99622 ...
##  $ Phenotypes.ExperimentBaseLine           : num  81419 84504 94039 98641 101354 ...
##  $ Phenotypes.ExperimentEndAverage         : num  8615209 6710686 6037319 5554230 5234758 ...
##  $ Phenotypes.ColonySize48h                : num  6487316 5329111 4946774 4684166 4440201 ...
##  $ Phenotypes.ChapmanRichardsParam2        : num  14.5 12.8 42.7 17.6 18.3 ...
##  $ Phenotypes.ChapmanRichardsParam3        : num  -2.74 -2.65 -2.58 -2.52 -2.47 ...
##  $ Phenotypes.ChapmanRichardsParamXtra     : num  16 16 16.2 16.3 16.3 ...
##  $ Phenotypes.ChapmanRichardsParam1        : num  1.95 1.89 1.84 1.8 1.77 ...
##  $ Phenotypes.ChapmanRichardsParam4        : num  -31.33 -31.47 -3.93 -2.78 -2.48 ...
##  $ Phenotypes.GenerationTimeStErrOfEstimate: num  0.012879 0.002174 0.000633 0.001667 0.000759 ...
##  $ Phenotypes.ExperimentGrowthYield        : num  8533790 6626182 5943280 5455589 5133404 ...
##  $ Phenotypes.GenerationTime               : num  2.53 2.48 2.51 2.56 2.53 ...
##  $ Phenotypes.ExperimentPopulationDoublings: num  6.73 6.31 6 5.82 5.69 ...
##  $ Phenotypes.ChapmanRichardsFit           : num  0.999 0.999 0.999 0.999 0.999 ...
##  $ Phenotypes.GenerationTimeWhen           : num  6.14 4.1 4.1 3.76 3.76 ...

Several phenotypes are extracted. However, the most useful for this study will be,

  • Column No: 14 i.e. Phenotypes.ExperimentGrowthYield
  • Column No: 15 i.e. Phenotypes.GenerationTime

Extract only this two column in the final data.frame
Rename the column names to prevent any ambiguity

data_Ctrl_Abs_Trim <- data_Ctrl_Abs[, 14:15]
colnames(data_Ctrl_Abs_Trim) <- c("CTRL_Y_ABS", "CTRL_GT_ABS")
str(data_Ctrl_Abs_Trim)
## 'data.frame':    36864 obs. of  2 variables:
##  $ CTRL_Y_ABS : num  8533790 6626182 5943280 5455589 5133404 ...
##  $ CTRL_GT_ABS: num  2.53 2.48 2.51 2.56 2.53 ...

GENERATE ACETIC ACID ABSOLUTE DATASET

Following the same strategy as above

m <- vector(mode = "character", length = 0)
file.names<-vector(mode = "character", length = 0)
temp_df<-data.frame()
data_AA_Abs <- data.frame()
for(i in 1:24){
  m <- paste0("aa", i, ".phenotypes.Absolute") 
  file.names[i] <- dir("RAW_DATA/SOM_SCR_R002/", pattern = m, full.names = TRUE)
  temp_df <- read.csv(file.names[i], na.strings = "NoGrowth")
  data_AA_Abs <- rbind(data_AA_Abs, temp_df)
}
data_AA_Abs_Trim <- data_AA_Abs[, 14:15]
colnames(data_AA_Abs_Trim) <- c("AA_Y_ABS", "AA_GT_ABS")

GENERATE BASAL NORMALIZED DATASET

m <- vector(mode = "character", length = 0)
file.names<-vector(mode = "character", length = 0)
temp_df<-data.frame()
data_Ctrl_Norm <- data.frame()
for(i in 1:24){
  m <- paste0("Ctrl", i, ".phenotypes.Normalized") 
  file.names[i] <- dir("RAW_DATA/SOM_SCR_R002/", pattern = m, full.names = TRUE)
  temp_df <- read.csv(file.names[i], na.strings = "NoGrowth")
  data_Ctrl_Norm <- rbind(data_Ctrl_Norm, temp_df)
}
str(data_Ctrl_Norm)
## 'data.frame':    36864 obs. of  8 variables:
##  $ Plate                                   : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Row                                     : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Column                                  : int  0 1 2 3 4 5 6 7 8 9 ...
##  $ Phenotypes.ExperimentGrowthYield        : num  0.755 0.39 0.233 0.484 0.396 ...
##  $ Phenotypes.GenerationTime               : num  0.0505 0.0204 0.0389 -0.0173 -0.0327 ...
##  $ Phenotypes.ExperimentPopulationDoublings: num  0.1907 0.0991 0.0272 0.113 0.0818 ...
##  $ Phenotypes.ExperimentBaseLine           : num  -0.0884 -0.0347 0.1195 0.0367 0.0758 ...
##  $ Phenotypes.ColonySize48h                : num  0.584 0.301 0.193 0.397 0.32 ...

The most useful for this study will be,

  • Column No: 4 i.e. Phenotypes.ExperimentGrowthYield
  • Column No: 5 i.e. Phenotypes.GenerationTime

Extract only this two column

data_Ctrl_Norm_Trim <- data_Ctrl_Norm[, 4:5]
colnames(data_Ctrl_Norm_Trim) <- c("CTRL_Y_NORM", "CTRL_GT_NORM")

GENERATE ACETIC ACID NORMALIZED DATASET

Same as above

m <- vector(mode = "character", length = 0)
file.names<-vector(mode = "character", length = 0)
temp_df<-data.frame()
data_AA_Norm <- data.frame()
for(i in 1:24){
  m <- paste0("aa", i, ".phenotypes.Normalized") 
  file.names[i] <- dir("RAW_DATA/SOM_SCR_R002/", pattern = m, full.names = TRUE)
  temp_df <- read.csv(file.names[i], na.strings = "NoGrowth")
  data_AA_Norm <- rbind(data_AA_Norm, temp_df)
}
data_AA_Norm_Trim <- data_AA_Norm[, 4:5]
colnames(data_AA_Norm_Trim) <- c("AA_Y_NORM", "AA_GT_NORM")

COMBINE THE DATASETS TO OBTAIN FINAL DATAFRAME

Trimmed datasets are combined to obtain the final data.frame. The combined data frame is labeled as data from ROUND2

R <- rep("2nd_round", 36864)
Round_ID <- data.frame(R, stringsAsFactors = FALSE)
whole_data_R2 <- cbind(Metadata_CRISPRi, 
                       Round_ID, 
                       data_Ctrl_Abs_Trim, 
                       data_AA_Abs_Trim, 
                       data_Ctrl_Norm_Trim, 
                       data_AA_Norm_Trim)
colnames(whole_data_R2)[12] <- "Round_ID"
str(whole_data_R2)
## 'data.frame':    36864 obs. of  20 variables:
##  $ SL_No             : num  1 2 3 4 5 6 7 8 9 10 ...
##  $ gRNA_name         : chr  "FBP26-TRg-1" "FBP26-TRg-1" "HMI1-NRg-1" "HMI1-NRg-1" ...
##  $ Seq               : chr  "GCTTATCATACATTTACATC" "GCTTATCATACATTTACATC" "AAAAATTCTGACACATCACA" "AAAAATTCTGACACATCACA" ...
##  $ SOURCEPLATEID     : chr  "R2877.H.001" "R2877.H.001" "R2877.H.001" "R2877.H.001" ...
##  $ SOURCEDENSITY     : chr  "384A" "384A" "384A" "384A" ...
##  $ SOURCECOLONYCOLUMN: int  1 1 2 2 3 3 4 4 5 5 ...
##  $ SOURCECOLONYROW   : chr  "A" "A" "A" "A" ...
##  $ border            : logi  TRUE TRUE TRUE TRUE TRUE TRUE ...
##  $ GENE              : chr  "FBP26" "FBP26" "HMI1" "HMI1" ...
##  $ Control.gRNA      : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Location_1536     : chr  "A1" "A2" "A3" "A4" ...
##  $ Round_ID          : chr  "2nd_round" "2nd_round" "2nd_round" "2nd_round" ...
##  $ CTRL_Y_ABS        : num  8533790 6626182 5943280 5455589 5133404 ...
##  $ CTRL_GT_ABS       : num  2.53 2.48 2.51 2.56 2.53 ...
##  $ AA_Y_ABS          : num  2090439 2241914 1861277 1920070 1957912 ...
##  $ AA_GT_ABS         : num  8.92 8.43 9.19 8.91 8.87 ...
##  $ CTRL_Y_NORM       : num  0.755 0.39 0.233 0.484 0.396 ...
##  $ CTRL_GT_NORM      : num  0.0505 0.0204 0.0389 -0.0173 -0.0327 ...
##  $ AA_Y_NORM         : num  0.373 0.474 0.205 0.386 0.415 ...
##  $ AA_GT_NORM        : num  0.019 -0.0614 0.0621 -0.2135 -0.219 ...

IMPORT RESULTS FROM ROUND1

The results from Round1 is already compiled to a .csv file in COMPILED_DATA folder Results 1st Round : 20190903_CRISPRi_Screen_aa_1st_round.csv

Import the dataset and label as data from ROUND1

First_round <- read.csv("COMPILED_DATA/20190903_CRISPRi_Screen_aa_1st_round.csv", 
                        na.strings = c("#N/A", "NoGrowth"), 
                        stringsAsFactors = FALSE)
R <- rep("1st_round", 36864)
Round_ID <- data.frame(R, stringsAsFactors = FALSE)
whole_data_R1 <- cbind(Metadata_CRISPRi, Round_ID, First_round[, 12:19])
colnames(whole_data_R1)[12] <- "Round_ID"
str(whole_data_R1)
## 'data.frame':    36864 obs. of  20 variables:
##  $ SL_No             : num  1 2 3 4 5 6 7 8 9 10 ...
##  $ gRNA_name         : chr  "FBP26-TRg-1" "FBP26-TRg-1" "HMI1-NRg-1" "HMI1-NRg-1" ...
##  $ Seq               : chr  "GCTTATCATACATTTACATC" "GCTTATCATACATTTACATC" "AAAAATTCTGACACATCACA" "AAAAATTCTGACACATCACA" ...
##  $ SOURCEPLATEID     : chr  "R2877.H.001" "R2877.H.001" "R2877.H.001" "R2877.H.001" ...
##  $ SOURCEDENSITY     : chr  "384A" "384A" "384A" "384A" ...
##  $ SOURCECOLONYCOLUMN: int  1 1 2 2 3 3 4 4 5 5 ...
##  $ SOURCECOLONYROW   : chr  "A" "A" "A" "A" ...
##  $ border            : logi  TRUE TRUE TRUE TRUE TRUE TRUE ...
##  $ GENE              : chr  "FBP26" "FBP26" "HMI1" "HMI1" ...
##  $ Control.gRNA      : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Location_1536     : chr  "A1" "A2" "A3" "A4" ...
##  $ Round_ID          : chr  "1st_round" "1st_round" "1st_round" "1st_round" ...
##  $ CTRL_Y_ABS        : num  7046465 5380541 4841666 4517281 4209174 ...
##  $ CTRL_GT_ABS       : num  3.15 3.64 3.18 3.04 3.13 ...
##  $ AA_Y_ABS          : num  833657 844666 708042 734725 717147 ...
##  $ AA_GT_ABS         : num  13.1 13.6 14.5 14.7 13.6 ...
##  $ CTRL_Y_NORM       : num  0.899 0.51 0.358 0.565 0.463 ...
##  $ CTRL_GT_NORM      : num  -0.10076 0.10855 -0.08699 0.00504 0.0462 ...
##  $ AA_Y_NORM         : num  -0.362 -0.343 -0.598 -0.642 -0.677 ...
##  $ AA_GT_NORM        : num  0.273 0.331 0.423 0.425 0.313 ...

COMBINE THE DATASETS of ROUND 1 AND 2

whole_data_CRISPRi_aa <- rbind(whole_data_R1, whole_data_R2)

SCAN-O-MATIC PHENOMICS ANALYSIS

In this study most of the downstream analysis was performed using the phenotype Generation_time(GT)

PURPOSE 2

In this session, downstream data processing and statistical analysis of SCAN-O-MATIC raw output will be performed

ESTIMATE THE LOG PHENOTYPIC INDEX (LPI) VALUES

LPI of strain is the difference of its normalized Generation_Time(GT) / Yield(Y) (LSC, see IMPORT SCAN-O-MATIC RAW DATA) on acetic acid stress plate to the basal condition. It gives a RELATIVE estimate of how a strain performed under acetic acid stress relative to the basal condition.

The RELATIVE GENERATION TIME i.e. LPI_GT = LSC_GT_Acetic_Acid - LSC_GT_Basal

whole_data_CRISPRi_aa[, 21] <- whole_data_CRISPRi_aa[, 19]-whole_data_CRISPRi_aa[, 17]
whole_data_CRISPRi_aa[, 22] <- whole_data_CRISPRi_aa[, 20]-whole_data_CRISPRi_aa[, 18]
colnames(whole_data_CRISPRi_aa)[21] <- "LPI_Y"
colnames(whole_data_CRISPRi_aa)[22] <- "LPI_GT"

PERFORM PLATE-WISE BATCH CORRECTION

Plate-wise batch correction was conducted by subtracting the median of LSC GT values of all the individual colonies on a plate from the individual LSC GT values of the colonies growing on that plate.

i.e. if strainX is growing in Basal condition on plate Z, the corrected LSC_GT value for strainX in the Basal condition is the following;

  • LSC_GT_Basal_CorrectedstrainX = (LSC_GT_BasalstrainX) - Median(LSC_GT BasalPlateZ)
plate_ID <- as.character(unique(whole_data_CRISPRi_aa$SOURCEPLATEID))
whole_data_CRISPRi_aa_corrected <- whole_data_CRISPRi_aa
med_LogLSCctrl_RND1_GT <- vector(mode = "integer", length = 0)
med_LogLSCaa_RND1_GT <- vector(mode = "integer", length = 0)
med_LogLSCctrl_RND2_GT <- vector(mode = "integer", length = 0)
med_LogLSCaa_RND2_GT <- vector(mode = "integer", length = 0)
med_LogLSCctrl_RND1_Y <- vector(mode = "integer", length = 0)
med_LogLSCaa_RND1_Y <- vector(mode = "integer", length = 0)
med_LogLSCctrl_RND2_Y <- vector(mode = "integer", length = 0)
med_LogLSCaa_RND2_Y <- vector(mode = "integer", length = 0)

for(i in 1:24){
med_LogLSCctrl_RND1_GT[i] <- median(whole_data_CRISPRi_aa_corrected$CTRL_GT_NORM[which(whole_data_CRISPRi_aa_corrected$SOURCEPLATEID==plate_ID[i]
                                                                                       & !is.na(whole_data_CRISPRi_aa_corrected$CTRL_GT_NORM) 
                                                                                         & whole_data_CRISPRi_aa_corrected$Round_ID=="1st_round")])

med_LogLSCaa_RND1_GT[i] <- median(whole_data_CRISPRi_aa_corrected$AA_GT_NORM[which(whole_data_CRISPRi_aa_corrected$SOURCEPLATEID==plate_ID[i]
                                                                                     & !is.na(whole_data_CRISPRi_aa_corrected$AA_GT_NORM) 
                                                                                     & whole_data_CRISPRi_aa_corrected$Round_ID=="1st_round")])

med_LogLSCctrl_RND2_GT[i] <- median(whole_data_CRISPRi_aa_corrected$CTRL_GT_NORM[which(whole_data_CRISPRi_aa_corrected$SOURCEPLATEID==plate_ID[i]
                                                                                         & !is.na(whole_data_CRISPRi_aa_corrected$CTRL_GT_NORM) 
                                                                                         & whole_data_CRISPRi_aa_corrected$Round_ID=="2nd_round")])

med_LogLSCaa_RND2_GT[i] <- median(whole_data_CRISPRi_aa_corrected$AA_GT_NORM[which(whole_data_CRISPRi_aa_corrected$SOURCEPLATEID==plate_ID[i]
                                                                                     & !is.na(whole_data_CRISPRi_aa_corrected$AA_GT_NORM) 
                                                                                     & whole_data_CRISPRi_aa_corrected$Round_ID=="2nd_round")])

med_LogLSCctrl_RND1_Y[i] <- median(whole_data_CRISPRi_aa_corrected$CTRL_Y_NORM[which(whole_data_CRISPRi_aa_corrected$SOURCEPLATEID==plate_ID[i]
                                                                                       & !is.na(whole_data_CRISPRi_aa_corrected$CTRL_Y_NORM) 
                                                                                       & whole_data_CRISPRi_aa_corrected$Round_ID=="1st_round")])

med_LogLSCaa_RND1_Y[i] <- median(whole_data_CRISPRi_aa_corrected$AA_Y_NORM[which(whole_data_CRISPRi_aa_corrected$SOURCEPLATEID==plate_ID[i]
                                                                                   & !is.na(whole_data_CRISPRi_aa_corrected$AA_Y_NORM) 
                                                                                   & whole_data_CRISPRi_aa_corrected$Round_ID=="1st_round")])

med_LogLSCctrl_RND2_Y[i] <- median(whole_data_CRISPRi_aa_corrected$CTRL_Y_NORM[which(whole_data_CRISPRi_aa_corrected$SOURCEPLATEID==plate_ID[i]
                                                                                       & !is.na(whole_data_CRISPRi_aa_corrected$CTRL_Y_NORM) 
                                                                                       & whole_data_CRISPRi_aa_corrected$Round_ID=="2nd_round")])

med_LogLSCaa_RND2_Y[i] <- median(whole_data_CRISPRi_aa_corrected$AA_Y_NORM[which(whole_data_CRISPRi_aa_corrected$SOURCEPLATEID==plate_ID[i]
                                                                                   & !is.na(whole_data_CRISPRi_aa_corrected$AA_Y_NORM) 
                                                                                   & whole_data_CRISPRi_aa_corrected$Round_ID=="2nd_round")])
  
whole_data_CRISPRi_aa_corrected[which(whole_data_CRISPRi_aa_corrected$SOURCEPLATEID==plate_ID[i] &
                                          whole_data_CRISPRi_aa_corrected$Round_ID=="1st_round") , 23] <- whole_data_CRISPRi_aa_corrected[which(whole_data_CRISPRi_aa_corrected$SOURCEPLATEID==plate_ID[i] &
                                                                                                                                                  whole_data_CRISPRi_aa_corrected$Round_ID=="1st_round"), 17] - med_LogLSCctrl_RND1_Y[i]
  whole_data_CRISPRi_aa_corrected[which(whole_data_CRISPRi_aa_corrected$SOURCEPLATEID==plate_ID[i] &
                                          whole_data_CRISPRi_aa_corrected$Round_ID=="1st_round") , 24] <- whole_data_CRISPRi_aa_corrected[which(whole_data_CRISPRi_aa_corrected$SOURCEPLATEID==plate_ID[i] &
                                                                                                                                                  whole_data_CRISPRi_aa_corrected$Round_ID=="1st_round"), 18] - med_LogLSCctrl_RND1_GT[i]
  whole_data_CRISPRi_aa_corrected[which(whole_data_CRISPRi_aa_corrected$SOURCEPLATEID==plate_ID[i] &
                                          whole_data_CRISPRi_aa_corrected$Round_ID=="2nd_round") , 23] <- whole_data_CRISPRi_aa_corrected[which(whole_data_CRISPRi_aa_corrected$SOURCEPLATEID==plate_ID[i] &
                                                                                                                                                  whole_data_CRISPRi_aa_corrected$Round_ID=="2nd_round"), 17] - med_LogLSCctrl_RND2_Y[i]
  whole_data_CRISPRi_aa_corrected[which(whole_data_CRISPRi_aa_corrected$SOURCEPLATEID==plate_ID[i] &
                                          whole_data_CRISPRi_aa_corrected$Round_ID=="2nd_round") , 24] <- whole_data_CRISPRi_aa_corrected[which(whole_data_CRISPRi_aa_corrected$SOURCEPLATEID==plate_ID[i] &
                                                                                                                                                  whole_data_CRISPRi_aa_corrected$Round_ID=="2nd_round"), 18] - med_LogLSCctrl_RND2_GT[i]
  whole_data_CRISPRi_aa_corrected[which(whole_data_CRISPRi_aa_corrected$SOURCEPLATEID==plate_ID[i] &
                                          whole_data_CRISPRi_aa_corrected$Round_ID=="1st_round") , 25] <- whole_data_CRISPRi_aa_corrected[which(whole_data_CRISPRi_aa_corrected$SOURCEPLATEID==plate_ID[i] &
                                                                                                                                                  whole_data_CRISPRi_aa_corrected$Round_ID=="1st_round"), 19] - med_LogLSCaa_RND1_Y[i]
  whole_data_CRISPRi_aa_corrected[which(whole_data_CRISPRi_aa_corrected$SOURCEPLATEID==plate_ID[i] &
                                          whole_data_CRISPRi_aa_corrected$Round_ID=="1st_round") , 26] <- whole_data_CRISPRi_aa_corrected[which(whole_data_CRISPRi_aa_corrected$SOURCEPLATEID==plate_ID[i] &
                                                                                                                                                  whole_data_CRISPRi_aa_corrected$Round_ID=="1st_round"), 20] - med_LogLSCaa_RND1_GT[i]
  whole_data_CRISPRi_aa_corrected[which(whole_data_CRISPRi_aa_corrected$SOURCEPLATEID==plate_ID[i] &
                                          whole_data_CRISPRi_aa_corrected$Round_ID=="2nd_round") , 25] <- whole_data_CRISPRi_aa_corrected[which(whole_data_CRISPRi_aa_corrected$SOURCEPLATEID==plate_ID[i] &
                                                                                                                                                  whole_data_CRISPRi_aa_corrected$Round_ID=="2nd_round"), 19] - med_LogLSCaa_RND2_Y[i]
  whole_data_CRISPRi_aa_corrected[which(whole_data_CRISPRi_aa_corrected$SOURCEPLATEID==plate_ID[i] &
                                          whole_data_CRISPRi_aa_corrected$Round_ID=="2nd_round") , 26] <- whole_data_CRISPRi_aa_corrected[which(whole_data_CRISPRi_aa_corrected$SOURCEPLATEID==plate_ID[i] &
                                                                                                                                                  whole_data_CRISPRi_aa_corrected$Round_ID=="2nd_round"), 20] - med_LogLSCaa_RND2_GT[i]
}

ESTIMATE THE BATCH CORRECTED LOG PHENOTYPIC INDEX (LPI) VALUES

Estimate the corrected LPI values (see ESTIMATE THE LOG PHENOTYPIC INDEX (LPI) VALUES) based on the corrected LSC values

i.e. LPI_GTcorrected = LSC_GT_Acetic_Acidcorrected - LSC_GT_Basalcorrected

Estimate the corrected LPI_Y

whole_data_CRISPRi_aa_corrected[, 27] <- whole_data_CRISPRi_aa_corrected[, 25] - whole_data_CRISPRi_aa_corrected[, 23]

Estimate the corrected LPI_GT

whole_data_CRISPRi_aa_corrected[, 28] <- whole_data_CRISPRi_aa_corrected[, 26] - whole_data_CRISPRi_aa_corrected[, 24] 

SETTING THE NAMES OF THE NEW COLUMNS

colnm <- colnames(whole_data_CRISPRi_aa)[17:22]
colnm <- paste0(colnm, "_CR")
colnames(whole_data_CRISPRi_aa_corrected)[23:28] <- colnm
str(whole_data_CRISPRi_aa_corrected)
## 'data.frame':    73728 obs. of  28 variables:
##  $ SL_No             : num  1 2 3 4 5 6 7 8 9 10 ...
##  $ gRNA_name         : chr  "FBP26-TRg-1" "FBP26-TRg-1" "HMI1-NRg-1" "HMI1-NRg-1" ...
##  $ Seq               : chr  "GCTTATCATACATTTACATC" "GCTTATCATACATTTACATC" "AAAAATTCTGACACATCACA" "AAAAATTCTGACACATCACA" ...
##  $ SOURCEPLATEID     : chr  "R2877.H.001" "R2877.H.001" "R2877.H.001" "R2877.H.001" ...
##  $ SOURCEDENSITY     : chr  "384A" "384A" "384A" "384A" ...
##  $ SOURCECOLONYCOLUMN: int  1 1 2 2 3 3 4 4 5 5 ...
##  $ SOURCECOLONYROW   : chr  "A" "A" "A" "A" ...
##  $ border            : logi  TRUE TRUE TRUE TRUE TRUE TRUE ...
##  $ GENE              : chr  "FBP26" "FBP26" "HMI1" "HMI1" ...
##  $ Control.gRNA      : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Location_1536     : chr  "A1" "A2" "A3" "A4" ...
##  $ Round_ID          : chr  "1st_round" "1st_round" "1st_round" "1st_round" ...
##  $ CTRL_Y_ABS        : num  7046465 5380541 4841666 4517281 4209174 ...
##  $ CTRL_GT_ABS       : num  3.15 3.64 3.18 3.04 3.13 ...
##  $ AA_Y_ABS          : num  833657 844666 708042 734725 717147 ...
##  $ AA_GT_ABS         : num  13.1 13.6 14.5 14.7 13.6 ...
##  $ CTRL_Y_NORM       : num  0.899 0.51 0.358 0.565 0.463 ...
##  $ CTRL_GT_NORM      : num  -0.10076 0.10855 -0.08699 0.00504 0.0462 ...
##  $ AA_Y_NORM         : num  -0.362 -0.343 -0.598 -0.642 -0.677 ...
##  $ AA_GT_NORM        : num  0.273 0.331 0.423 0.425 0.313 ...
##  $ LPI_Y             : num  -1.262 -0.854 -0.956 -1.207 -1.14 ...
##  $ LPI_GT            : num  0.374 0.223 0.51 0.42 0.266 ...
##  $ CTRL_Y_NORM_CR    : num  0.918 0.529 0.376 0.583 0.482 ...
##  $ CTRL_GT_NORM_CR   : num  -0.0877 0.1216 -0.074 0.0181 0.0592 ...
##  $ AA_Y_NORM_CR      : num  -0.0736 -0.0547 -0.3092 -0.3534 -0.3883 ...
##  $ AA_GT_NORM_CR     : num  0.19 0.248 0.34 0.342 0.229 ...
##  $ LPI_Y_CR          : num  -0.991 -0.583 -0.686 -0.937 -0.87 ...
##  $ LPI_GT_CR         : num  0.278 0.127 0.414 0.323 0.17 ...

EXTRACT ONLY THE BATCH CORRECTED COLUMNS

whole_data_CRISPRi_aa_2 <- whole_data_CRISPRi_aa_corrected[, c(1:16, 23:28)]
colnames(whole_data_CRISPRi_aa_2)[17:22] <- colnames(whole_data_CRISPRi_aa)[17:22]
str(whole_data_CRISPRi_aa_2)
## 'data.frame':    73728 obs. of  22 variables:
##  $ SL_No             : num  1 2 3 4 5 6 7 8 9 10 ...
##  $ gRNA_name         : chr  "FBP26-TRg-1" "FBP26-TRg-1" "HMI1-NRg-1" "HMI1-NRg-1" ...
##  $ Seq               : chr  "GCTTATCATACATTTACATC" "GCTTATCATACATTTACATC" "AAAAATTCTGACACATCACA" "AAAAATTCTGACACATCACA" ...
##  $ SOURCEPLATEID     : chr  "R2877.H.001" "R2877.H.001" "R2877.H.001" "R2877.H.001" ...
##  $ SOURCEDENSITY     : chr  "384A" "384A" "384A" "384A" ...
##  $ SOURCECOLONYCOLUMN: int  1 1 2 2 3 3 4 4 5 5 ...
##  $ SOURCECOLONYROW   : chr  "A" "A" "A" "A" ...
##  $ border            : logi  TRUE TRUE TRUE TRUE TRUE TRUE ...
##  $ GENE              : chr  "FBP26" "FBP26" "HMI1" "HMI1" ...
##  $ Control.gRNA      : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Location_1536     : chr  "A1" "A2" "A3" "A4" ...
##  $ Round_ID          : chr  "1st_round" "1st_round" "1st_round" "1st_round" ...
##  $ CTRL_Y_ABS        : num  7046465 5380541 4841666 4517281 4209174 ...
##  $ CTRL_GT_ABS       : num  3.15 3.64 3.18 3.04 3.13 ...
##  $ AA_Y_ABS          : num  833657 844666 708042 734725 717147 ...
##  $ AA_GT_ABS         : num  13.1 13.6 14.5 14.7 13.6 ...
##  $ CTRL_Y_NORM       : num  0.918 0.529 0.376 0.583 0.482 ...
##  $ CTRL_GT_NORM      : num  -0.0877 0.1216 -0.074 0.0181 0.0592 ...
##  $ AA_Y_NORM         : num  -0.0736 -0.0547 -0.3092 -0.3534 -0.3883 ...
##  $ AA_GT_NORM        : num  0.19 0.248 0.34 0.342 0.229 ...
##  $ LPI_Y             : num  -0.991 -0.583 -0.686 -0.937 -0.87 ...
##  $ LPI_GT            : num  0.278 0.127 0.414 0.323 0.17 ...

CONSTRUCT A NEW DATA STRUCTURE

Construct a new data structure where data from each strain (have a unique guide-RNA) is in a separate row and the replicates from first and second round are side by side. Add also the mean, median and standard deviation statistics for each phenotype

REMOVE ROWS WITH SPATIAL CONTROL STRAIN DATA

Data_CRISPRi_aa <- subset(whole_data_CRISPRi_aa_2, whole_data_CRISPRi_aa_2$gRNA_name!="SP_Ctrl_CC23")

CREATE A TABLE OF UNIQUE gRNA

df_unique_sgRNA <- data.frame(table(Data_CRISPRi_aa$gRNA_name))

ARRANGE THE DATA IN THE DESIRED FORMAT

R1<-vector(mode = "integer", length = 0)
R2<-vector(mode = "integer", length = 0)
test2<-data.frame()
n<-nrow(df_unique_sgRNA)
for(i in 1:n){
  R1 <- which(Data_CRISPRi_aa$gRNA_name==df_unique_sgRNA$Var1[i] & Data_CRISPRi_aa$Round_ID=="1st_round")
  R2 <- which(Data_CRISPRi_aa$gRNA_name==df_unique_sgRNA$Var1[i] & Data_CRISPRi_aa$Round_ID=="2nd_round")
  test1 <- Data_CRISPRi_aa[c(R1, R2), ]
  test2[i, c(1:8)]<-test1[1, c(2:4, 6:7, 9:11)]
  test2[i, c(9:14)] <- test1$CTRL_GT_NORM
  test2[i, 15] <- mean(test1$CTRL_GT_NORM[1:3])
  test2[i, 16] <- mean(test1$CTRL_GT_NORM[4:6])
  test2[i, 17] <- sd(test1$CTRL_GT_NORM[1:3])
  test2[i, 18] <- sd(test1$CTRL_GT_NORM[4:6])
  test2[i, 19] <- mean(test1$CTRL_GT_NORM[1:6])
  test2[i, 20] <- median(test1$CTRL_GT_NORM[1:6])
  test2[i, 21] <- sd(test1$CTRL_GT_NORM[1:6])
  test2[i, c(22:27)] <- test1$AA_GT_NORM
  test2[i, 28] <- mean(test1$AA_GT_NORM[1:3])
  test2[i, 29] <- mean(test1$AA_GT_NORM[4:6])
  test2[i, 30] <- sd(test1$AA_GT_NORM[1:3])
  test2[i, 31] <- sd(test1$AA_GT_NORM[4:6])
  test2[i, 32] <- mean(test1$AA_GT_NORM[1:6])
  test2[i, 33] <- median(test1$AA_GT_NORM[1:6])
  test2[i, 34] <- sd(test1$AA_GT_NORM[1:6])
  test2[i, c(35:40)] <- test1$LPI_GT
  test2[i, 41] <- mean(test1$LPI_GT[1:3])
  test2[i, 42] <- mean(test1$LPI_GT[4:6])
  test2[i, 43] <- sd(test1$LPI_GT[1:3])
  test2[i, 44] <- sd(test1$LPI_GT[4:6])
  test2[i, 45] <- mean(test1$LPI_GT[1:6])
  test2[i, 46] <- median(test1$LPI_GT[1:6])
  test2[i, 47] <- sd(test1$LPI_GT[1:6])
  test2[i, c(48:53)] <- test1$CTRL_Y_NORM
  test2[i, 54] <- mean(test1$CTRL_Y_NORM[1:3])
  test2[i, 55] <- mean(test1$CTRL_Y_NORM[4:6])
  test2[i, 56] <- sd(test1$CTRL_Y_NORM[1:3])
  test2[i, 57] <- sd(test1$CTRL_Y_NORM[4:6])
  test2[i, 58] <- mean(test1$CTRL_Y_NORM[1:6])
  test2[i, 59] <- median(test1$CTRL_Y_NORM[1:6])
  test2[i, 60] <- sd(test1$CTRL_Y_NORM[1:6])
  test2[i, c(61:66)] <- test1$AA_Y_NORM
  test2[i, 67] <- mean(test1$AA_Y_NORM[1:3])
  test2[i, 68] <- mean(test1$AA_Y_NORM[4:6])
  test2[i, 69] <- sd(test1$AA_Y_NORM[1:3])
  test2[i, 70] <- sd(test1$AA_Y_NORM[4:6])
  test2[i, 71] <- mean(test1$AA_Y_NORM[1:6])
  test2[i, 72] <- median(test1$AA_Y_NORM[1:6])
  test2[i, 73] <- sd(test1$AA_Y_NORM[1:6])
  test2[i, c(74:79)] <- test1$LPI_Y
  test2[i, 80] <- mean(test1$LPI_Y[1:3])
  test2[i, 81] <- mean(test1$LPI_Y[4:6])
  test2[i, 82] <- sd(test1$LPI_Y[1:3])
  test2[i, 83] <- sd(test1$LPI_Y[4:6])
  test2[i, 84] <- mean(test1$LPI_Y[1:6])
  test2[i, 85] <- median(test1$LPI_Y[1:6])
  test2[i, 86] <- sd(test1$LPI_Y[1:6])
}

ASSIGN COLUMN NAMES

Column names are already stored in a text times available in the COMPILED_DATA folder. Then store the data.frame under a new name.

column_names <- read.table("COMPILED_DATA/Column_names.txt", header = FALSE, sep = "\t", as.is = TRUE)
colnames(test2) <- column_names$V1
Analysis_CRISPRi_aa_Complete <- test2
str(Analysis_CRISPRi_aa_Complete)
## 'data.frame':    9078 obs. of  86 variables:
##  $ gRNA_name          : chr  "AAR2-NRg-3" "AAR2-NRg-4" "AAR2-TRg-15" "AAR2-TRg-16" ...
##  $ Seq                : chr  "CCAGCGATAAGGAGGATCTT" "TGTGTCCTTTCTTCATCTCT" "AAAAGGAAAAAGTAATTAGG" "GTGAAAAGGAAAAAGTAATT" ...
##  $ SOURCEPLATEID      : chr  "R2877.H.023" "R2877.H.024" "R2877.H.023" "R2877.H.023" ...
##  $ SOURCECOLONYCOLUMN : int  5 21 22 20 21 18 6 7 9 8 ...
##  $ SOURCECOLONYROW    : chr  "O" "L" "P" "N" ...
##  $ GENE               : chr  "AAR2" "AAR2" "AAR2" "AAR2" ...
##  $ Control.gRNA       : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Location_1536      : chr  "AC9" "W41" "AE43" "AA39" ...
##  $ CTRL_GT_RND1_R1    : num  0.00397 0.00586 0.10456 -0.01142 0.0174 ...
##  $ CTRL_GT_RND1_R2    : num  -0.0086 0.00166 0.07118 0.00256 0.04094 ...
##  $ CTRL_GT_RND1_R3    : num  -0.000625 -0.043918 -0.009614 0.007969 -0.014018 ...
##  $ CTRL_GT_RND2_R1    : num  0.0431 -0.0336 0.0292 0.0381 0.0742 ...
##  $ CTRL_GT_RND2_R2    : num  -0.0165 0.00744 0.01433 0.06599 -0.02973 ...
##  $ CTRL_GT_RND2_R3    : num  0.03266 -0.04468 -0.04455 0.01994 -0.00924 ...
##  $ CTRL_GT_RND1_MEAN  : num  -0.00175 -0.01213 0.05538 -0.0003 0.01477 ...
##  $ CTRL_GT_RND2_MEAN  : num  0.019756 -0.023599 -0.000341 0.041342 0.011747 ...
##  $ CTRL_GT_RND1_SD    : num  0.00636 0.02761 0.05871 0.01001 0.02757 ...
##  $ CTRL_GT_RND2_SD    : num  0.0318 0.0275 0.039 0.0232 0.0551 ...
##  $ CTRL_GT_RND1_2_MEAN: num  0.009 -0.0179 0.0275 0.0205 0.0133 ...
##  $ CTRL_GT_RND1_2_MED : num  0.00167 -0.01595 0.02176 0.01395 0.00408 ...
##  $ CTRL_GT_RND1_2_SD  : num  0.0237 0.0254 0.054 0.0278 0.039 ...
##  $ AA_GT_RND1_R1      : num  -0.0239 0.06 -0.0178 -0.1839 0.5993 ...
##  $ AA_GT_RND1_R2      : num  -0.0311 0.0763 -0.0145 -0.1293 0.2619 ...
##  $ AA_GT_RND1_R3      : num  0.0315 -0.0191 -0.0778 -0.0514 0.3276 ...
##  $ AA_GT_RND2_R1      : num  0.0142 -0.1375 0.111 0.0105 0.2021 ...
##  $ AA_GT_RND2_R2      : num  0.0383 -0.2266 0.0588 0.0966 0.1831 ...
##  $ AA_GT_RND2_R3      : num  0.0265 -0.2298 0.0789 0.0787 0.0114 ...
##  $ AA_GT_RND1_MEAN    : num  -0.00781 0.03905 -0.03671 -0.12151 0.39624 ...
##  $ AA_GT_RND2_MEAN    : num  0.0263 -0.198 0.0829 0.0619 0.1322 ...
##  $ AA_GT_RND1_SD      : num  0.0343 0.051 0.0356 0.0666 0.1789 ...
##  $ AA_GT_RND2_SD      : num  0.0121 0.0524 0.0263 0.0455 0.105 ...
##  $ AA_GT_RND1_2_MEAN  : num  0.00925 -0.07945 0.02308 -0.02979 0.2642 ...
##  $ AA_GT_RND1_2_MED   : num  0.0203 -0.0783 0.0221 -0.0205 0.232 ...
##  $ AA_GT_RND1_2_SD    : num  0.0296 0.1378 0.0712 0.1127 0.1953 ...
##  $ LPI_GT_RND1_R1     : num  -0.0278 0.0541 -0.1224 -0.1724 0.5819 ...
##  $ LPI_GT_RND1_R2     : num  -0.0225 0.0746 -0.0857 -0.1318 0.2209 ...
##  $ LPI_GT_RND1_R3     : num  0.0322 0.0248 -0.0682 -0.0594 0.3416 ...
##  $ LPI_GT_RND2_R1     : num  -0.0289 -0.1039 0.0818 -0.0276 0.1279 ...
##  $ LPI_GT_RND2_R2     : num  0.0548 -0.2341 0.0444 0.0306 0.2128 ...
##  $ LPI_GT_RND2_R3     : num  -0.0062 -0.1851 0.1234 0.0588 0.0206 ...
##  $ LPI_GT_RND1_MEAN   : num  -0.00605 0.05119 -0.09208 -0.12121 0.38147 ...
##  $ LPI_GT_RND2_MEAN   : num  0.00656 -0.17435 0.08321 0.02058 0.12042 ...
##  $ LPI_GT_RND1_SD     : num  0.0332 0.025 0.0276 0.0573 0.1837 ...
##  $ LPI_GT_RND2_SD     : num  0.0433 0.0657 0.0395 0.0441 0.0963 ...
##  $ LPI_GT_RND1_2_MEAN : num  0.000251 -0.061582 -0.004437 -0.050313 0.250944 ...
##  $ LPI_GT_RND1_2_MED  : num  -0.0143 -0.0396 -0.0119 -0.0435 0.2169 ...
##  $ LPI_GT_RND1_2_SD   : num  0.0352 0.1313 0.1007 0.0901 0.1941 ...
##  $ CTRL_Y_RND1_R1     : num  0.055 0.0573 0.0189 0.0779 -0.1151 ...
##  $ CTRL_Y_RND1_R2     : num  0.0131 0.0472 0.0129 0.0562 -0.0723 ...
##  $ CTRL_Y_RND1_R3     : num  0.0306 0.0109 -0.0236 0.1208 -0.0171 ...
##  $ CTRL_Y_RND2_R1     : num  0.0399 0.0113 -0.1422 0.0605 -0.0676 ...
##  $ CTRL_Y_RND2_R2     : num  0.02531 0.0066 -0.17285 -0.00976 -0.08442 ...
##  $ CTRL_Y_RND2_R3     : num  -0.0528 0.0089 0.00598 0.06854 -0.08083 ...
##  $ CTRL_Y_RND1_MEAN   : num  0.03288 0.03847 0.00276 0.08497 -0.06817 ...
##  $ CTRL_Y_RND2_MEAN   : num  0.00414 0.00893 -0.10301 0.03977 -0.07762 ...
##  $ CTRL_Y_RND1_SD     : num  0.0211 0.0244 0.023 0.0329 0.0491 ...
##  $ CTRL_Y_RND2_SD     : num  0.04985 0.00234 0.09563 0.04308 0.00885 ...
##  $ CTRL_Y_RND1_2_MEAN : num  0.0185 0.0237 -0.0501 0.0624 -0.0729 ...
##  $ CTRL_Y_RND1_2_MED  : num  0.028 0.0111 -0.0088 0.0645 -0.0766 ...
##  $ CTRL_Y_RND1_2_SD   : num  0.0377 0.0224 0.085 0.0423 0.032 ...
##  $ AA_Y_RND1_R1       : num  0.0672 0.0106 0.1785 0.272 -2.1 ...
##  $ AA_Y_RND1_R2       : num  -0.3832 0.0102 0.1196 0.232 -1.1873 ...
##  $ AA_Y_RND1_R3       : num  -0.1599 -0.0465 -0.0381 0.1036 -1.0143 ...
##  $ AA_Y_RND2_R1       : num  0.0503 0.3083 -0.2434 0.0968 -0.4223 ...
##  $ AA_Y_RND2_R2       : num  -0.0505 0.4391 -0.2526 -0.094 -0.304 ...
##  $ AA_Y_RND2_R3       : num  -0.00342 0.52795 -0.20049 -0.02924 -0.26528 ...
##  $ AA_Y_RND1_MEAN     : num  -0.15864 -0.00857 0.08665 0.20251 -1.43385 ...
##  $ AA_Y_RND2_MEAN     : num  -0.00121 0.42511 -0.23216 -0.00879 -0.33052 ...
##  $ AA_Y_RND1_SD       : num  0.2252 0.0329 0.112 0.088 0.5834 ...
##  $ AA_Y_RND2_SD       : num  0.0504 0.1105 0.0278 0.097 0.0818 ...
##  $ AA_Y_RND1_2_MEAN   : num  -0.0799 0.2083 -0.0728 0.0969 -0.8822 ...
##  $ AA_Y_RND1_2_MED    : num  -0.027 0.159 -0.119 0.1 -0.718 ...
##  $ AA_Y_RND1_2_SD     : num  0.17 0.248 0.189 0.142 0.71 ...
##  $ LPI_Y_RND1_R1      : num  0.0122 -0.0467 0.1595 0.1942 -1.9849 ...
##  $ LPI_Y_RND1_R2      : num  -0.3963 -0.0369 0.1066 0.1757 -1.1149 ...
##  $ LPI_Y_RND1_R3      : num  -0.1905 -0.0575 -0.0145 -0.0172 -0.9972 ...
##  $ LPI_Y_RND2_R1      : num  0.0104 0.297 -0.1012 0.0363 -0.3547 ...
##  $ LPI_Y_RND2_R2      : num  -0.0758 0.4325 -0.0797 -0.0842 -0.2196 ...
##  $ LPI_Y_RND2_R3      : num  0.0494 0.519 -0.2065 -0.0978 -0.1844 ...
##  $ LPI_Y_RND1_MEAN    : num  -0.1915 -0.047 0.0839 0.1175 -1.3657 ...
##  $ LPI_Y_RND2_MEAN    : num  -0.00536 0.41618 -0.12915 -0.04856 -0.2529 ...
##  $ LPI_Y_RND1_SD      : num  0.2042 0.0103 0.0892 0.1171 0.5395 ...
##  $ LPI_Y_RND2_SD      : num  0.0641 0.1119 0.0678 0.0738 0.0899 ...
##  $ LPI_Y_RND1_2_MEAN  : num  -0.0984 0.1846 -0.0226 0.0345 -0.8093 ...
##  $ LPI_Y_RND1_2_MED   : num  -0.03273 0.13006 -0.04712 0.00953 -0.67593 ...
##  $ LPI_Y_RND1_2_SD    : num  0.169 0.263 0.137 0.126 0.701 ...

PERFORM STATISTICAL ANALYSIS

Multiple statistical method was applied to identify the best fit statistical model for this dataset. We start with the complete dataset and give it a new name to avoid distorting the original dataset.

Analysis_Final <- Analysis_CRISPRi_aa_Complete

METHOD 1

For METHOD 1, We hypothesized that the difference between the mean(µ) phenotypic performance of a specific CRISPRi strain (StrainX) in the two independent experimental rounds (n=2) to the mean phenotypic performance of all the CRISPRi strains that falls within the interquartile range (IQR) of the complete dataset would be zero, and any difference within the IQR to be just by chance.

Null Hypothesis : µ(µLPI_GT_StrainX_Round1, µLPI_GT_StrainX_Round2)- µ(InterquartileRange_LPI_GT) = 0

RECALCULATION OF SOME PHENOTYPIC PARAMETERS

In this method, we estimate the Mean / standard deviation (SD) of the LPI GT of Round 1 and Round 2 separately for each strain. When one/two of the three replicates of a strain in a round returned missing value (i.e. NA), then the mean / SD of LPI GT for that round is calculated by taking average of the non NA replicates. Therefore, excluding the missing values the mean and SD statistics were recalculated. We implemented a if else decision tree for this

  • The mean and SD of Normalized generation time (LSC GT mean) at Basal condition re-calculation
for(i in 1:nrow(Analysis_Final)){
  x1 <- as.numeric(Analysis_Final[i, 9:11][which(!is.na(Analysis_Final[i, 9:11]))])
  x2 <- as.numeric(Analysis_Final[i, 12:14][which(!is.na(Analysis_Final[i, 12:14]))])
  if(length(x1)==0){
    Analysis_Final$CTRL_GT_RND1_MEAN[i] <- NA
  } else{
    Analysis_Final$CTRL_GT_RND1_MEAN[i] <- as.numeric(mean(x1))
  }
  if(length(x2)==0){
    Analysis_Final$CTRL_GT_RND2_MEAN[i] <- NA
  } else{
    Analysis_Final$CTRL_GT_RND2_MEAN[i] <- as.numeric(mean(x2))
  }
  if(sum(is.na(c(Analysis_Final$CTRL_GT_RND1_MEAN[i], Analysis_Final$CTRL_GT_RND2_MEAN[i])))==0){
    Analysis_Final$CTRL_GT_RND1_2_MEAN[i] <- as.numeric(mean(c(Analysis_Final$CTRL_GT_RND1_MEAN[i], Analysis_Final$CTRL_GT_RND2_MEAN[i])))
    Analysis_Final[i, 87] <- as.numeric(sd(c(Analysis_Final$CTRL_GT_RND1_MEAN[i], Analysis_Final$CTRL_GT_RND2_MEAN[i])))
  } else{
    Analysis_Final$CTRL_GT_RND1_2_MEAN[i] <- NA
    Analysis_Final[i, 87] <- NA
  }
}
colnames(Analysis_Final)[87] <- "CTRL_GT_MEAN_RND1_2_SD"
  • The mean and SD of Normalized generation time (LSC GT mean) at 150mM acetic acid re-calculation
for(i in 1:nrow(Analysis_Final)){
  x1 <- as.numeric(Analysis_Final[i, 22:24][which(!is.na(Analysis_Final[i, 22:24]))])
  x2 <- as.numeric(Analysis_Final[i, 25:27][which(!is.na(Analysis_Final[i, 25:27]))])
  if(length(x1)==0){
    Analysis_Final$AA_GT_RND1_MEAN[i] <- NA
  } else{
    Analysis_Final$AA_GT_RND1_MEAN[i] <- as.numeric(mean(x1))
  }
  if(length(x2)==0){
    Analysis_Final$AA_GT_RND2_MEAN[i] <- NA
  } else{
    Analysis_Final$AA_GT_RND2_MEAN[i] <- as.numeric(mean(x2))
  }
  if(sum(is.na(c(Analysis_Final$AA_GT_RND1_MEAN[i], Analysis_Final$AA_GT_RND2_MEAN[i])))==0){
    Analysis_Final$AA_GT_RND1_2_MEAN[i] <- as.numeric(mean(c(Analysis_Final$AA_GT_RND1_MEAN[i], Analysis_Final$AA_GT_RND2_MEAN[i])))
    Analysis_Final[i, 88] <- as.numeric(sd(c(Analysis_Final$AA_GT_RND1_MEAN[i], Analysis_Final$AA_GT_RND2_MEAN[i])))
  } else{
    Analysis_Final$AA_GT_RND1_2_MEAN[i] <- NA
    Analysis_Final[i, 88] <- NA
  }
}
colnames(Analysis_Final)[88] <- "AA_GT_MEAN_RND1_2_SD"
  • The mean and SD of RELATIVE generation time (LPI GT mean) at 150mM acetic acid re-calculation
for(i in 1:nrow(Analysis_Final)){
  x1 <- as.numeric(Analysis_Final[i, 35:37][which(!is.na(Analysis_Final[i, 35:37]))])
  x2 <- as.numeric(Analysis_Final[i, 38:40][which(!is.na(Analysis_Final[i, 38:40]))])
  if(length(x1)==0){
    Analysis_Final$LPI_GT_RND1_MEAN[i] <- NA
  } else{
    Analysis_Final$LPI_GT_RND1_MEAN[i] <- as.numeric(mean(x1))
  }
  if(length(x2)==0){
    Analysis_Final$LPI_GT_RND2_MEAN[i] <- NA
  } else{
    Analysis_Final$LPI_GT_RND2_MEAN[i] <- as.numeric(mean(x2))
  }
  if(sum(is.na(c(Analysis_Final$LPI_GT_RND1_MEAN[i], Analysis_Final$LPI_GT_RND2_MEAN[i])))==0){
    Analysis_Final$LPI_GT_RND1_2_MEAN[i] <- as.numeric(mean(c(Analysis_Final$LPI_GT_RND1_MEAN[i], Analysis_Final$LPI_GT_RND2_MEAN[i])))
    Analysis_Final[i, 89] <- as.numeric(sd(c(Analysis_Final$LPI_GT_RND1_MEAN[i], Analysis_Final$LPI_GT_RND2_MEAN[i])))
  } else{
    Analysis_Final$LPI_GT_RND1_2_MEAN[i] <- NA
    Analysis_Final[i, 89] <- NA
  }
}
colnames(Analysis_Final)[89] <- "LPI_GT_MEAN_RND1_2_SD"
EXTRACT ALL LPI GT MEAN DATA POINTS WITHIN INTER-QUARTILE-RANGE (IQR)

BOX PLOT - MEAN RELATIVE GENERATION TIME (LPI GT)

Figure 2: Boxplot of mean relative generation time (LPI GT) for all strains in the library

Figure 2: Boxplot of mean relative generation time (LPI GT) for all strains in the library

Display Box-plot statistics

box_stat_LPI_GT_R1_2_mean$stats
##             [,1]
## [1,] -0.16933911
## [2,] -0.02428792
## [3,]  0.02084505
## [4,]  0.07255828
## [5,]  0.21771505
  • 25th Percentile = -0.02428792
  • 75th Percentile = 0.07255828

Therefore, extraction of the data points within IQR

Intermediate_50 <- Analysis_Final$LPI_GT_RND1_2_MEAN[which(Analysis_Final$LPI_GT_RND1_2_MEAN>=-0.02428792
                                                           &Analysis_Final$LPI_GT_RND1_2_MEAN<=0.07255828)]
summary(Intermediate_50)
##       Min.    1st Qu.     Median       Mean    3rd Qu.       Max. 
## -0.0242879 -0.0009547  0.0208382  0.0219831  0.0445144  0.0725032
ESTIMATE P-VALUE

P-value is estimated by Welch two sample two-sided t-test (an adaptation of Student’s t-test)

for(i in 1:nrow(Analysis_Final)){
  if(sum(is.na(c(Analysis_Final$LPI_GT_RND1_MEAN[i], Analysis_Final$LPI_GT_RND2_MEAN[i])))==0){
    P_value <- t.test(Intermediate_50, c(Analysis_Final$LPI_GT_RND1_MEAN[i], Analysis_Final$LPI_GT_RND2_MEAN[i]))
    Analysis_Final[i, 90] <- P_value$p.value
  } else{
    Analysis_Final[i, 90] <- NA
  }
}
colnames(Analysis_Final)[90] <- "P_value_M1"
FALSE DISCOVERY RATE ADJUSTMENT OF P-VALUE

P-value adjustment by BENJAMINI-HOCHBERG False Discovery Rate (FDR) method

Analysis_Final[which(!is.na(Analysis_Final$P_value_M1)), 91] <- p.adjust(Analysis_Final$P_value_M1[which(!is.na(Analysis_Final$P_value_M1))], 
                                                                      method = "BH", 
                                                                      n = length(Analysis_Final$P_value_M1[which(!is.na(Analysis_Final$P_value_M1))]))
colnames(Analysis_Final)[91] <- "P.adjusted_M1"
P-VALUE DISGNOSTICS FOR METHOD1

NUMBER OF SIGNIFICANT STRAINS

length(Analysis_Final$P_value_M1[which(Analysis_Final$P_value_M1<=0.05)])
## [1] 434
length(Analysis_Final$P.adjusted_M1[which(Analysis_Final$P.adjusted_M1<=0.05)])
## [1] 66
length(Analysis_Final$P_value_M1[which(Analysis_Final$P_value_M1<=0.1)])
## [1] 842
length(Analysis_Final$P.adjusted_M1[which(Analysis_Final$P.adjusted_M1<=0.1)])
## [1] 71

P-VALUE DIAGNOSTICS BY HISTOGRAM ANALYSIS

Figure 3: P-value diagnostic by histogram Method 1

Figure 3: P-value diagnostic by histogram Method 1

CONCLUSIONS METHOD 1

Method 1 was too rigid as n=2. Even the smallest standard deviation between round1 and round2 is making an observation insignificant. This method puts the whole weightage on the variability between round1 and round2, not on the deviation from the mean of intermediate 50%. Therefore statistical method 1 was discarded after careful evaluation.

METHOD 2

For METHOD 2, We hypothesized that the difference between the mean(µ) phenotypic performance (LPI GT) of a specific CRISPRi strain (StrainX) in a independent experimental round (each has three technical replicates, i.e. n=3) to the mean phenotypic performance of all the replicates of the CRISPRi control strains (with gRNA targeting no genetic locus in S. cerevisiae) in that respective screening round would be zero, and any difference within the CRISPRi control strain phenotypic performance range (LPI GT range) to be just by chance.

Null Hypothesis : µStrainX(LPI_GTReplica1, LPI_GTReplica2, LPI_GTReplica3)- µCRISPRi_Control_Strains(LPI_GT) = 0

In this method P-values for each strain were estimated for each round and only strain that showed significant performance in both round were considered for further analysis

First we clone the dataset in a new name to avoid any distortion down the line

Analysis_Final_2 <- Analysis_Final
str(Analysis_Final_2)
## 'data.frame':    9078 obs. of  91 variables:
##  $ gRNA_name             : chr  "AAR2-NRg-3" "AAR2-NRg-4" "AAR2-TRg-15" "AAR2-TRg-16" ...
##  $ Seq                   : chr  "CCAGCGATAAGGAGGATCTT" "TGTGTCCTTTCTTCATCTCT" "AAAAGGAAAAAGTAATTAGG" "GTGAAAAGGAAAAAGTAATT" ...
##  $ SOURCEPLATEID         : chr  "R2877.H.023" "R2877.H.024" "R2877.H.023" "R2877.H.023" ...
##  $ SOURCECOLONYCOLUMN    : int  5 21 22 20 21 18 6 7 9 8 ...
##  $ SOURCECOLONYROW       : chr  "O" "L" "P" "N" ...
##  $ GENE                  : chr  "AAR2" "AAR2" "AAR2" "AAR2" ...
##  $ Control.gRNA          : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Location_1536         : chr  "AC9" "W41" "AE43" "AA39" ...
##  $ CTRL_GT_RND1_R1       : num  0.00397 0.00586 0.10456 -0.01142 0.0174 ...
##  $ CTRL_GT_RND1_R2       : num  -0.0086 0.00166 0.07118 0.00256 0.04094 ...
##  $ CTRL_GT_RND1_R3       : num  -0.000625 -0.043918 -0.009614 0.007969 -0.014018 ...
##  $ CTRL_GT_RND2_R1       : num  0.0431 -0.0336 0.0292 0.0381 0.0742 ...
##  $ CTRL_GT_RND2_R2       : num  -0.0165 0.00744 0.01433 0.06599 -0.02973 ...
##  $ CTRL_GT_RND2_R3       : num  0.03266 -0.04468 -0.04455 0.01994 -0.00924 ...
##  $ CTRL_GT_RND1_MEAN     : num  -0.00175 -0.01213 0.05538 -0.0003 0.01477 ...
##  $ CTRL_GT_RND2_MEAN     : num  0.019756 -0.023599 -0.000341 0.041342 0.011747 ...
##  $ CTRL_GT_RND1_SD       : num  0.00636 0.02761 0.05871 0.01001 0.02757 ...
##  $ CTRL_GT_RND2_SD       : num  0.0318 0.0275 0.039 0.0232 0.0551 ...
##  $ CTRL_GT_RND1_2_MEAN   : num  0.009 -0.0179 0.0275 0.0205 0.0133 ...
##  $ CTRL_GT_RND1_2_MED    : num  0.00167 -0.01595 0.02176 0.01395 0.00408 ...
##  $ CTRL_GT_RND1_2_SD     : num  0.0237 0.0254 0.054 0.0278 0.039 ...
##  $ AA_GT_RND1_R1         : num  -0.0239 0.06 -0.0178 -0.1839 0.5993 ...
##  $ AA_GT_RND1_R2         : num  -0.0311 0.0763 -0.0145 -0.1293 0.2619 ...
##  $ AA_GT_RND1_R3         : num  0.0315 -0.0191 -0.0778 -0.0514 0.3276 ...
##  $ AA_GT_RND2_R1         : num  0.0142 -0.1375 0.111 0.0105 0.2021 ...
##  $ AA_GT_RND2_R2         : num  0.0383 -0.2266 0.0588 0.0966 0.1831 ...
##  $ AA_GT_RND2_R3         : num  0.0265 -0.2298 0.0789 0.0787 0.0114 ...
##  $ AA_GT_RND1_MEAN       : num  -0.00781 0.03905 -0.03671 -0.12151 0.39624 ...
##  $ AA_GT_RND2_MEAN       : num  0.0263 -0.198 0.0829 0.0619 0.1322 ...
##  $ AA_GT_RND1_SD         : num  0.0343 0.051 0.0356 0.0666 0.1789 ...
##  $ AA_GT_RND2_SD         : num  0.0121 0.0524 0.0263 0.0455 0.105 ...
##  $ AA_GT_RND1_2_MEAN     : num  0.00925 -0.07945 0.02308 -0.02979 0.2642 ...
##  $ AA_GT_RND1_2_MED      : num  0.0203 -0.0783 0.0221 -0.0205 0.232 ...
##  $ AA_GT_RND1_2_SD       : num  0.0296 0.1378 0.0712 0.1127 0.1953 ...
##  $ LPI_GT_RND1_R1        : num  -0.0278 0.0541 -0.1224 -0.1724 0.5819 ...
##  $ LPI_GT_RND1_R2        : num  -0.0225 0.0746 -0.0857 -0.1318 0.2209 ...
##  $ LPI_GT_RND1_R3        : num  0.0322 0.0248 -0.0682 -0.0594 0.3416 ...
##  $ LPI_GT_RND2_R1        : num  -0.0289 -0.1039 0.0818 -0.0276 0.1279 ...
##  $ LPI_GT_RND2_R2        : num  0.0548 -0.2341 0.0444 0.0306 0.2128 ...
##  $ LPI_GT_RND2_R3        : num  -0.0062 -0.1851 0.1234 0.0588 0.0206 ...
##  $ LPI_GT_RND1_MEAN      : num  -0.00605 0.05119 -0.09208 -0.12121 0.38147 ...
##  $ LPI_GT_RND2_MEAN      : num  0.00656 -0.17435 0.08321 0.02058 0.12042 ...
##  $ LPI_GT_RND1_SD        : num  0.0332 0.025 0.0276 0.0573 0.1837 ...
##  $ LPI_GT_RND2_SD        : num  0.0433 0.0657 0.0395 0.0441 0.0963 ...
##  $ LPI_GT_RND1_2_MEAN    : num  0.000251 -0.061582 -0.004437 -0.050313 0.250944 ...
##  $ LPI_GT_RND1_2_MED     : num  -0.0143 -0.0396 -0.0119 -0.0435 0.2169 ...
##  $ LPI_GT_RND1_2_SD      : num  0.0352 0.1313 0.1007 0.0901 0.1941 ...
##  $ CTRL_Y_RND1_R1        : num  0.055 0.0573 0.0189 0.0779 -0.1151 ...
##  $ CTRL_Y_RND1_R2        : num  0.0131 0.0472 0.0129 0.0562 -0.0723 ...
##  $ CTRL_Y_RND1_R3        : num  0.0306 0.0109 -0.0236 0.1208 -0.0171 ...
##  $ CTRL_Y_RND2_R1        : num  0.0399 0.0113 -0.1422 0.0605 -0.0676 ...
##  $ CTRL_Y_RND2_R2        : num  0.02531 0.0066 -0.17285 -0.00976 -0.08442 ...
##  $ CTRL_Y_RND2_R3        : num  -0.0528 0.0089 0.00598 0.06854 -0.08083 ...
##  $ CTRL_Y_RND1_MEAN      : num  0.03288 0.03847 0.00276 0.08497 -0.06817 ...
##  $ CTRL_Y_RND2_MEAN      : num  0.00414 0.00893 -0.10301 0.03977 -0.07762 ...
##  $ CTRL_Y_RND1_SD        : num  0.0211 0.0244 0.023 0.0329 0.0491 ...
##  $ CTRL_Y_RND2_SD        : num  0.04985 0.00234 0.09563 0.04308 0.00885 ...
##  $ CTRL_Y_RND1_2_MEAN    : num  0.0185 0.0237 -0.0501 0.0624 -0.0729 ...
##  $ CTRL_Y_RND1_2_MED     : num  0.028 0.0111 -0.0088 0.0645 -0.0766 ...
##  $ CTRL_Y_RND1_2_SD      : num  0.0377 0.0224 0.085 0.0423 0.032 ...
##  $ AA_Y_RND1_R1          : num  0.0672 0.0106 0.1785 0.272 -2.1 ...
##  $ AA_Y_RND1_R2          : num  -0.3832 0.0102 0.1196 0.232 -1.1873 ...
##  $ AA_Y_RND1_R3          : num  -0.1599 -0.0465 -0.0381 0.1036 -1.0143 ...
##  $ AA_Y_RND2_R1          : num  0.0503 0.3083 -0.2434 0.0968 -0.4223 ...
##  $ AA_Y_RND2_R2          : num  -0.0505 0.4391 -0.2526 -0.094 -0.304 ...
##  $ AA_Y_RND2_R3          : num  -0.00342 0.52795 -0.20049 -0.02924 -0.26528 ...
##  $ AA_Y_RND1_MEAN        : num  -0.15864 -0.00857 0.08665 0.20251 -1.43385 ...
##  $ AA_Y_RND2_MEAN        : num  -0.00121 0.42511 -0.23216 -0.00879 -0.33052 ...
##  $ AA_Y_RND1_SD          : num  0.2252 0.0329 0.112 0.088 0.5834 ...
##  $ AA_Y_RND2_SD          : num  0.0504 0.1105 0.0278 0.097 0.0818 ...
##  $ AA_Y_RND1_2_MEAN      : num  -0.0799 0.2083 -0.0728 0.0969 -0.8822 ...
##  $ AA_Y_RND1_2_MED       : num  -0.027 0.159 -0.119 0.1 -0.718 ...
##  $ AA_Y_RND1_2_SD        : num  0.17 0.248 0.189 0.142 0.71 ...
##  $ LPI_Y_RND1_R1         : num  0.0122 -0.0467 0.1595 0.1942 -1.9849 ...
##  $ LPI_Y_RND1_R2         : num  -0.3963 -0.0369 0.1066 0.1757 -1.1149 ...
##  $ LPI_Y_RND1_R3         : num  -0.1905 -0.0575 -0.0145 -0.0172 -0.9972 ...
##  $ LPI_Y_RND2_R1         : num  0.0104 0.297 -0.1012 0.0363 -0.3547 ...
##  $ LPI_Y_RND2_R2         : num  -0.0758 0.4325 -0.0797 -0.0842 -0.2196 ...
##  $ LPI_Y_RND2_R3         : num  0.0494 0.519 -0.2065 -0.0978 -0.1844 ...
##  $ LPI_Y_RND1_MEAN       : num  -0.1915 -0.047 0.0839 0.1175 -1.3657 ...
##  $ LPI_Y_RND2_MEAN       : num  -0.00536 0.41618 -0.12915 -0.04856 -0.2529 ...
##  $ LPI_Y_RND1_SD         : num  0.2042 0.0103 0.0892 0.1171 0.5395 ...
##  $ LPI_Y_RND2_SD         : num  0.0641 0.1119 0.0678 0.0738 0.0899 ...
##  $ LPI_Y_RND1_2_MEAN     : num  -0.0984 0.1846 -0.0226 0.0345 -0.8093 ...
##  $ LPI_Y_RND1_2_MED      : num  -0.03273 0.13006 -0.04712 0.00953 -0.67593 ...
##  $ LPI_Y_RND1_2_SD       : num  0.169 0.263 0.137 0.126 0.701 ...
##  $ CTRL_GT_MEAN_RND1_2_SD: num  0.01521 0.00811 0.0394 0.02945 0.00214 ...
##  $ AA_GT_MEAN_RND1_2_SD  : num  0.0241 0.1676 0.0846 0.1297 0.1867 ...
##  $ LPI_GT_MEAN_RND1_2_SD : num  0.00892 0.15948 0.12395 0.10026 0.18459 ...
##  $ P_value_M1            : num  0.178 0.594 0.814 0.494 0.33 ...
##  $ P.adjusted_M1         : num  0.91 0.911 0.952 0.91 0.91 ...
EXTRACT CRISPRi CONTROL STRAINS DATA

Extract the CRISPRi-control strains LPI GT data from ROUND1 and ROUND2, respectively and store the output in two different vectors.

  • ROUND 1
CRISPRi_Ctrl_Round1 <- whole_data_CRISPRi_aa_2$LPI_GT[which(whole_data_CRISPRi_aa_2$Control.gRNA == 1 
                                                            & whole_data_CRISPRi_aa_2$Round_ID=="1st_round")]
summary(CRISPRi_Ctrl_Round1)
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
## -0.06721  0.09108  0.16819  0.15842  0.20944  0.38829
  • ROUND 2
CRISPRi_Ctrl_Round2 <- whole_data_CRISPRi_aa_2$LPI_GT[which(whole_data_CRISPRi_aa_2$Control.gRNA == 1 
                                                           & whole_data_CRISPRi_aa_2$Round_ID=="2nd_round")]
summary(CRISPRi_Ctrl_Round2)
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
## -0.21609 -0.07291 -0.02904 -0.01828  0.04256  0.15629
ESTIMATE P-VALUES FOR ROUND 1 AND 2

P-value is estimated by Welch two sample two-sided t-test (an adaptation of Student’s t-test)

for(i in 1:nrow(Analysis_Final_2)){
  test1 <- t(Analysis_Final_2[i, 35:37])
  test2 <- t(Analysis_Final_2[i, 38:40])
  if(sum(!is.na(test1[, 1]))>=2){
    P_value_RND1 <- t.test(CRISPRi_Ctrl_Round1, test1[which(!is.na(test1[, 1]))])
    Analysis_Final_2[i, 92] <- P_value_RND1$p.value
  } else {
    Analysis_Final_2[i, 92] <- NA
  }
  if(sum(!is.na(test2[, 1]))>=2){
    P_value_RND2 <- t.test(CRISPRi_Ctrl_Round2, test2[which(!is.na(test2[, 1]))])
    Analysis_Final_2[i, 93] <- P_value_RND2$p.value
  } else {
    Analysis_Final_2[i, 93] <- NA
  }
}
colnames(Analysis_Final_2)[92:93] <- c("P_value_RND1_M2", "P_value_RND2_M2")
FALSE DISCOVERY RATE ADJUSTMENT OF P-VALUES FOR ROUND 1 AND 2

P-value adjustment by BENJAMINI-HOCHBERG False Discovery Rate (FDR) method

Analysis_Final_2[which(!is.na(Analysis_Final_2$P_value_RND1_M2)), 94] <- p.adjust(Analysis_Final_2$P_value_RND1_M2[which(!is.na(Analysis_Final_2$P_value_RND1_M2))], 
                                                                                  method = "BH", 
                                                                                  n = length(Analysis_Final_2$P_value_RND1_M2[which(!is.na(Analysis_Final_2$P_value_RND1_M2))]))

Analysis_Final_2[which(!is.na(Analysis_Final_2$P_value_RND2_M2)), 95] <- p.adjust(Analysis_Final_2$P_value_RND2_M2[which(!is.na(Analysis_Final_2$P_value_RND2_M2))], 
                                                                                  method = "BH", 
                                                                                  n = length(Analysis_Final_2$P_value_RND2_M2[which(!is.na(Analysis_Final_2$P_value_RND2_M2))]))

colnames(Analysis_Final_2)[94:95] <- c("P.adjusted_RND1_M2", "P.adjusted_RND2_M2")
P-VALUE DISGNOSTICS FOR METHOD 2 : ROUND1

NUMBER OF SIGNIFICANT STRAINS

length(Analysis_Final_2$P_value_RND1_M2[which(Analysis_Final_2$P_value_RND1_M2<=0.05)])
## [1] 4601
length(Analysis_Final_2$P.adjusted_RND1_M2[which(Analysis_Final_2$P.adjusted_RND1_M2<=0.05)])
## [1] 3389
length(Analysis_Final_2$P_value_RND1_M2[which(Analysis_Final_2$P_value_RND1_M2<=0.1)])
## [1] 5635
length(Analysis_Final_2$P.adjusted_RND1_M2[which(Analysis_Final_2$P.adjusted_RND1_M2<=0.1)])
## [1] 4692

P-VALUE DIAGNOSTICS BY HISTOGRAM ANALYSIS ROUND 1

Figure 4: P-value diagnostic by histogram, Method 2, Round 1

Figure 4: P-value diagnostic by histogram, Method 2, Round 1

P-VALUE DISGNOSTICS FOR METHOD 2 : ROUND2

NUMBER OF SIGNIFICANT STRAINS

length(Analysis_Final_2$P_value_RND2_M2[which(Analysis_Final_2$P_value_RND2_M2<=0.05)])
## [1] 2304
length(Analysis_Final_2$P.adjusted_RND2_M2[which(Analysis_Final_2$P.adjusted_RND2_M2<=0.05)])
## [1] 987
length(Analysis_Final_2$P_value_RND2_M2[which(Analysis_Final_2$P_value_RND2_M2<=0.1)])
## [1] 3174
length(Analysis_Final_2$P.adjusted_RND2_M2[which(Analysis_Final_2$P.adjusted_RND2_M2<=0.1)])
## [1] 1431

P-VALUE DIAGNOSTICS BY HISTOGRAM ANALYSIS ROUND 2

Figure 5: P-value diagnostic by histogram, Method 2, Round 2

Figure 5: P-value diagnostic by histogram, Method 2, Round 2

CONCLUSIONS METHOD 2

It is a robust statistical method. However, one of the major problem with this method is setting different thresholds for p.adjusted values and LPI GT Mean for each round.

METHOD 3 AND METHOD 4

For METHOD 3, We hypothesized that the difference between the mean(µ) phenotypic performance of a specific CRISPRi strain (StrainX) considering all technical replicates (3 in each) in the two independent experimental rounds (i.e. n=6) to the mean phenotypic performance of all the CRISPRi strains that falls within the interquartile range (IQR) of the complete dataset would be zero, and any difference within the IQR to be just by chance.

Null Hypothesis : µStrainX(All_replicates_LPI_GT)- µ(InterquartileRange_LPI_GT) = 0

Additionally, we tested one final statistical model to determine significance of our observations

For METHOD 4, We hypothesized that the difference between the mean(µ) phenotypic performance of a specific CRISPRi strain (StrainX) considering all technical replicates (3 in each) in the two independent experimental rounds (i.e. n=6) to the mean phenotypic performance of all the CRISPRi control strains (with gRNA targeting no genetic locus in S. cerevisiae) would be zero, and any difference within the CRISPRi control strains phenotypic performance range (LPI GT range) to be just by chance.

Null Hypothesis : µStrainX(All_replicates_LPI_GT) - µCRISPRi_Control_Strains(LPI_GT) = 0

To ensure that we don’t distort the original dataset we clone the Analysis dataset in a new name

Analysis_Final_3 <- Analysis_Final_2
EXTRACT ALL LPI GT DATA POINTS (INCLUDING ALL REPLICATES) WITHIN INTER-QUARTILE-RANGE (IQR)

Since we will consider all replicates this time, we will compare it with all replicates (NOT MEAN) that falls within IQR for Method 3. For this purpose, we extract the IQR dataset including all the replicate data for each strain. We will use the data.frame Data_CRISPRi_aa (see, REMOVE ROWS WITH SPATIAL CONTROL STRAIN DATA) to extract this numeric vector.

BOX PLOT - RELATIVE GENERATION TIME (LPI GT)

Figure 6: Boxplot of relative generation time (LPI GT) for all strains including all replicates in the library

Figure 6: Boxplot of relative generation time (LPI GT) for all strains including all replicates in the library

Display Box-plot statistics

boxplot_stat_LPI_GT$stats 
##             [,1]
## [1,] -0.25630118
## [2,] -0.04373846
## [3,]  0.02045212
## [4,]  0.09804938
## [5,]  0.31050530
  • 25th Percentile = -0.04373846
  • 75th Percentile = 0.09804938

Therefore, extraction of the data points within IQR

Intermediate_50_M3 <- Data_CRISPRi_aa$LPI_GT[which(Data_CRISPRi_aa$LPI_GT >=-0.04373846
                                                   &Data_CRISPRi_aa$LPI_GT<=0.09804938)]
summary(Intermediate_50_M3)
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
## -0.04374 -0.01004  0.02045  0.02236  0.05325  0.09805
EXTRACT CRISPRi CONTROL STRAINS DATA (ALL REPLICATES)

This time we extract all the replicate data (non the mean) of each of the CRISPRi control strains for the Method 4

Crispri_control_M4 <- Data_CRISPRi_aa$LPI_GT[which(Data_CRISPRi_aa$Control.gRNA==1)]
summary(Crispri_control_M4)
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
## -0.21609 -0.03042  0.06889  0.07007  0.16628  0.38829
RECALCULATE THE LSC GT MEAN AT BASAL CONDITION AND LPI GT MEAN OF EACH STRAIN

We recalculate the above parameter taking all six replicates into account and excluding the missing values. For this purpose we use a if else decision tree. This means we get a LSC GT / LPI GT value if at-least 1 replicate managed to grow at a particular condition. Else it will return a missing value or NA

Additionally, we also create two columns that shows number of replicates of a strain managed to grow in basal condition (n_CTRL) and number of replicates in acetic acid condition (n_LPI)

for(i in 1:nrow(Analysis_Final_3)){
  test1 <- t(Analysis_Final_3[i, 9:14])
  test2 <- t(Analysis_Final_3[i, 35:40])
  x1 <- sum(!is.na(test1[, 1]))
  x2 <- sum(!is.na(test2[, 1]))
  CTRL_GT_Mean_temp <- mean(test1[which(!is.na(test1[, 1]))])
  LPI_GT_Mean_temp <- mean(test2[which(!is.na(test2[, 1]))])
  Analysis_Final_3[i, 96] <- CTRL_GT_Mean_temp
  Analysis_Final_3[i, 97] <- x1
  Analysis_Final_3[i, 98] <- LPI_GT_Mean_temp
  Analysis_Final_3[i, 99] <- x2
}
colnames(Analysis_Final_3)[96:99] <- c("CTRL_GT_Mean_all", "n_CTRL", "LPI_GT_Mean_all", "n_LPI")
ESTIMATE P-VALUES FOR METHOD 3 AND 4

P-value is estimated by Welch two sample two-sided t-test (an adaptation of Student’s t-test)

for(i in 1:nrow(Analysis_Final_3)){
  test <- t(Analysis_Final_3[i, 35:40])
  x <- sum(!is.na(test[, 1]))
  if(x>2){
    P.value_temp_M3 <- t.test(Intermediate_50_M3, test[which(!is.na(test[, 1]))])
    P.value_temp_M4 <- t.test(Crispri_control_M4, test[which(!is.na(test[, 1]))])
    Analysis_Final_3[i, 100] <- P.value_temp_M3$p.value
    Analysis_Final_3[i, 101] <- P.value_temp_M4$p.value
  } else {
    Analysis_Final_3[i, 100] <- NA
    Analysis_Final_3[i, 101] <- NA
  }
}
colnames(Analysis_Final_3)[100:101] <- c("P.value_M3", "P.value_M4")
FALSE DISCOVERY RATE ADJUSTMENT OF P-VALUES FOR METHOD 3 AND 4

P-value adjustment by BENJAMINI-HOCHBERG False Discovery Rate (FDR) method

Analysis_Final_3[which(!is.na(Analysis_Final_3$P.value_M3)), 102] <- p.adjust(Analysis_Final_3$P.value_M3[which(!is.na(Analysis_Final_3$P.value_M3))], 
                                                                              method = "BH", 
                                                                              n = length(Analysis_Final_3$P.value_M3[which(!is.na(Analysis_Final_3$P.value_M3))]))
Analysis_Final_3[which(!is.na(Analysis_Final_3$P.value_M4)), 103] <- p.adjust(Analysis_Final_3$P.value_M4[which(!is.na(Analysis_Final_3$P.value_M4))], 
                                                                              method = "BH", 
                                                                              n = length(Analysis_Final_3$P.value_M4[which(!is.na(Analysis_Final_3$P.value_M4))]))
colnames(Analysis_Final_3)[102:103] <- c("P.adjusted_M3", "P.adjusted_M4")
P-VALUE DISGNOSTICS FOR METHOD 3

NUMBER OF SIGNIFICANT STRAINS

length(Analysis_Final_3$P.value_M3[which(Analysis_Final_3$P.value_M3<=0.05)])
## [1] 2468
length(Analysis_Final_3$P.adjusted_M3[which(Analysis_Final_3$P.adjusted_M3<=0.05)])
## [1] 514
length(Analysis_Final_3$P.value_M3[which(Analysis_Final_3$P.value_M3<=0.1)])
## [1] 3392
length(Analysis_Final_3$P.adjusted_M3[which(Analysis_Final_3$P.adjusted_M3<=0.1)])
## [1] 1258

P-VALUE DIAGNOSTICS BY HISTOGRAM ANALYSIS

Figure 7 (Fig S11 in Manuscript): P-value diagnostic by histogram, Method 3

Figure 7 (Fig S11 in Manuscript): P-value diagnostic by histogram, Method 3

P-VALUE DISGNOSTICS FOR METHOD 4

NUMBER OF SIGNIFICANT STRAINS

length(Analysis_Final_3$P.value_M4[which(Analysis_Final_3$P.value_M4<=0.05)])
## [1] 3663
length(Analysis_Final_3$P.adjusted_M4[which(Analysis_Final_3$P.adjusted_M4<=0.05)])
## [1] 2212
length(Analysis_Final_3$P.value_M4[which(Analysis_Final_3$P.value_M4<=0.1)])
## [1] 4545
length(Analysis_Final_3$P.adjusted_M4[which(Analysis_Final_3$P.adjusted_M4<=0.1)])
## [1] 3306

P-VALUE DIAGNOSTICS BY HISTOGRAM ANALYSIS

Figure 8: P-value diagnostic by histogram, Method 4

Figure 8: P-value diagnostic by histogram, Method 4

CONCLUSIONS METHOD 3

P.values generated by Method 3 can be corrected efficiently using the FDR method and after the correction the P.adjusted values have nearly equal distribution, which is indicative of a robust statistical outcome. Therefore Method 3 is a good statistical method for this dataset.

CONCLUSIONS METHOD 4

Although Method 4 is effective to identify candidates deviated most from the CRISPRi control means, but the FDR method is less effective on the generated P.value. Therefore, less efficient for the current dataset. Moreover, the CRISPRi control strains for some reason consistently displayed a slower growth under acetic acid compared to the mean of the population. This resulted a bias for method 4 in candidate selection.

FINAL CONCLUSION FOR STATISTICAL ANALYSIS

Out of the 4 statistical methods evaluated, METHOD 3 was the most promising method to identify the significant candidates. Therefore, for this study we considered the results of statistical Method 3 for further downstream analysis.

SETTING THE STATISTICAL AND EFFECTSIZE THRESHOLD

Number of strains with Adjusted P-value ≤ 0.1

length(Analysis_Final_3$P.adjusted_M3[which(Analysis_Final_3$P.adjusted_M3 <= 0.1)])
## [1] 1258

To avoid missing potential candidates just because of high variability among the replicates, we keep the adjusted P-value threshold less strict i.e. ≤ 0.1. In addition, we introduce an effect size threshold i.e. the phenotypic performance range of CRISPRi control strains.

  • Estimating the Effect size threshold to identify acetic acid sensitive candidates
max(Analysis_Final_3$LPI_GT_Mean_all[which(Analysis_Final_3$Control.gRNA==1)])
## [1] 0.165662

Therefore, any strain that have an adjusted P-value ≤ 0.1 AND mean LPI GT > 0.165662 will be considered SENSITIVE to acetic acid

  • Estimating the Effect size threshold to identify acetic acid tolerant candidates
min(Analysis_Final_3$LPI_GT_Mean_all[which(Analysis_Final_3$Control.gRNA==1)])
## [1] -0.03680838

Therefore, any strain that have an adjusted P-value ≤ 0.1 AND mean LPI GT < -0.03680838 will be considered TOLERANT to acetic acid

EXTRACT THE ACETIC ACID TOLERANT STRAINS

  • Extract the row index that satisfy the statistical (adjusted P-value ≤ 0.1) and effect size (mean LPI GT < -0.03680838) criterion for acetic acid tolerant candidates
candidate_padj_0.1_FIT_M3 <- which((Analysis_Final_3$LPI_GT_Mean_all < -0.03680838 & Analysis_Final_3$P.adjusted_M3<= 0.1))
length(candidate_padj_0.1_FIT_M3)
## [1] 478

This gives 478 ACETIC ACID TOLERANT strains

  • Extract the row data of acetic acid tolerant strains
Fit_M3_complete <- Analysis_Final_3[candidate_padj_0.1_FIT_M3, ]
Fit_M3_complete <- Fit_M3_complete[order(Fit_M3_complete$LPI_GT_Mean_all, decreasing = FALSE), ]
str(Fit_M3_complete)
## 'data.frame':    478 obs. of  103 variables:
##  $ gRNA_name             : chr  "RPN9-TRg-4" "RGL1-NRg-7" "RPN9-NRg-7" "POP3-NRg-5" ...
##  $ Seq                   : chr  "ACCCGCTCCCCGCTTTCATC" "GCTCTTGTTTAGTAGGCGTG" "ACCGGATGAAAGCGGGGAGC" "CAAATATCCGCCCTGGCAAT" ...
##  $ SOURCEPLATEID         : chr  "R2877.H.021" "R2877.H.002" "R2877.H.020" "R2877.H.022" ...
##  $ SOURCECOLONYCOLUMN    : int  1 3 7 19 4 3 18 7 1 3 ...
##  $ SOURCECOLONYROW       : chr  "B" "P" "N" "C" ...
##  $ GENE                  : chr  "RPN9" "RGL1" "RPN9" "POP3" ...
##  $ Control.gRNA          : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Location_1536         : chr  "C1" "AE5" "AA13" "E37" ...
##  $ CTRL_GT_RND1_R1       : num  0.07603 -0.05695 0.02247 -0.01067 0.00419 ...
##  $ CTRL_GT_RND1_R2       : num  0.08332 -0.06969 -0.02313 0.01964 0.00952 ...
##  $ CTRL_GT_RND1_R3       : num  0.16235 -0.00953 0.00817 0.06949 -0.03925 ...
##  $ CTRL_GT_RND2_R1       : num  -0.0229 -0.01756 0.08764 0.0332 0.00409 ...
##  $ CTRL_GT_RND2_R2       : num  0.01055 -0.01944 -0.0114 0.02561 0.00428 ...
##  $ CTRL_GT_RND2_R3       : num  -0.00635 -0.06164 0.0289 0.03377 -0.01342 ...
##  $ CTRL_GT_RND1_MEAN     : num  0.10723 -0.04539 0.00251 0.02615 -0.00851 ...
##  $ CTRL_GT_RND2_MEAN     : num  -0.00624 -0.03288 0.03505 0.03086 -0.00168 ...
##  $ CTRL_GT_RND1_SD       : num  0.0479 0.0317 0.0233 0.0405 0.0267 ...
##  $ CTRL_GT_RND2_SD       : num  0.01672 0.02492 0.0498 0.00455 0.01016 ...
##  $ CTRL_GT_RND1_2_MEAN   : num  0.0505 -0.0391 0.0188 0.0285 -0.0051 ...
##  $ CTRL_GT_RND1_2_MED    : num  0.04329 -0.0382 0.01532 0.0294 0.00414 ...
##  $ CTRL_GT_RND1_2_SD     : num  0.0699 0.0264 0.0391 0.0259 0.0185 ...
##  $ AA_GT_RND1_R1         : num  -0.525 -0.44 -0.386 -0.34 -0.263 ...
##  $ AA_GT_RND1_R2         : num  -0.767 -0.484 -0.488 -0.312 -0.257 ...
##  $ AA_GT_RND1_R3         : num  -0.621 -0.338 -0.4 -0.375 -0.363 ...
##  $ AA_GT_RND2_R1         : num  -0.2124 -0.1735 0.0426 -0.107 -0.2219 ...
##  $ AA_GT_RND2_R2         : num  -0.172 -0.2 -0.12 -0.176 -0.256 ...
##  $ AA_GT_RND2_R3         : num  -0.1797 -0.2696 -0.1352 -0.0916 -0.1875 ...
##  $ AA_GT_RND1_MEAN       : num  -0.638 -0.421 -0.425 -0.342 -0.294 ...
##  $ AA_GT_RND2_MEAN       : num  -0.1881 -0.2144 -0.0709 -0.1249 -0.2218 ...
##  $ AA_GT_RND1_SD         : num  0.1214 0.0746 0.055 0.0317 0.0595 ...
##  $ AA_GT_RND2_SD         : num  0.0214 0.0496 0.0985 0.045 0.0343 ...
##  $ AA_GT_RND1_2_MEAN     : num  -0.413 -0.318 -0.248 -0.234 -0.258 ...
##  $ AA_GT_RND1_2_MED      : num  -0.369 -0.304 -0.261 -0.244 -0.257 ...
##  $ AA_GT_RND1_2_SD       : num  0.2583 0.1265 0.2066 0.124 0.0588 ...
##  $ LPI_GT_RND1_R1        : num  -0.602 -0.383 -0.409 -0.329 -0.267 ...
##  $ LPI_GT_RND1_R2        : num  -0.85 -0.414 -0.465 -0.332 -0.267 ...
##  $ LPI_GT_RND1_R3        : num  -0.783 -0.329 -0.408 -0.445 -0.324 ...
##  $ LPI_GT_RND2_R1        : num  -0.1895 -0.156 -0.0451 -0.1402 -0.226 ...
##  $ LPI_GT_RND2_R2        : num  -0.183 -0.181 -0.109 -0.202 -0.26 ...
##  $ LPI_GT_RND2_R3        : num  -0.173 -0.208 -0.164 -0.125 -0.174 ...
##  $ LPI_GT_RND1_MEAN      : num  -0.745 -0.375 -0.427 -0.368 -0.286 ...
##  $ LPI_GT_RND2_MEAN      : num  -0.182 -0.181 -0.106 -0.156 -0.22 ...
##  $ LPI_GT_RND1_SD        : num  0.1285 0.0432 0.0324 0.0661 0.0328 ...
##  $ LPI_GT_RND2_SD        : num  0.00809 0.02602 0.05954 0.04043 0.04341 ...
##  $ LPI_GT_RND1_2_MEAN    : num  -0.463 -0.278 -0.267 -0.262 -0.253 ...
##  $ LPI_GT_RND1_2_MED     : num  -0.395 -0.268 -0.286 -0.265 -0.263 ...
##  $ LPI_GT_RND1_2_SD      : num  0.319 0.1109 0.1812 0.1264 0.0497 ...
##  $ CTRL_Y_RND1_R1        : num  0.1577 -0.2668 -0.0429 0.2105 0.0193 ...
##  $ CTRL_Y_RND1_R2        : num  -0.0673 -0.2514 -0.0582 0.1997 0.0202 ...
##  $ CTRL_Y_RND1_R3        : num  0.2922 -0.0261 -0.0683 0.1684 0.058 ...
##  $ CTRL_Y_RND2_R1        : num  0.0912 -0.0829 -0.0174 0.0119 0.0743 ...
##  $ CTRL_Y_RND2_R2        : num  -0.19136 -0.09411 -0.00691 0.11114 0.07309 ...
##  $ CTRL_Y_RND2_R3        : num  0.295 -0.0282 -0.0197 -0.0138 0.0828 ...
##  $ CTRL_Y_RND1_MEAN      : num  0.1275 -0.1815 -0.0565 0.1929 0.0325 ...
##  $ CTRL_Y_RND2_MEAN      : num  0.0649 -0.0684 -0.0147 0.0364 0.0767 ...
##  $ CTRL_Y_RND1_SD        : num  0.1816 0.1347 0.0128 0.0219 0.0221 ...
##  $ CTRL_Y_RND2_SD        : num  0.24423 0.03528 0.00683 0.06596 0.00529 ...
##  $ CTRL_Y_RND1_2_MEAN    : num  0.0962 -0.1249 -0.0356 0.1147 0.0546 ...
##  $ CTRL_Y_RND1_2_MED     : num  0.1245 -0.0885 -0.0313 0.1398 0.0656 ...
##  $ CTRL_Y_RND1_2_SD      : num  0.1955 0.1077 0.0247 0.0963 0.0282 ...
##  $ AA_Y_RND1_R1          : num  1.696 -0.399 1.145 1.162 0.45 ...
##  $ AA_Y_RND1_R2          : num  1.589 -0.333 1.135 1.103 0.45 ...
##  $ AA_Y_RND1_R3          : num  1.528 -0.161 1.087 1.2 0.543 ...
##  $ AA_Y_RND2_R1          : num  0.644 0.28 0.11 0.187 0.405 ...
##  $ AA_Y_RND2_R2          : num  0.279 0.368 0.185 0.18 0.416 ...
##  $ AA_Y_RND2_R3          : num  0.461 0.606 0.27 0.147 0.494 ...
##  $ AA_Y_RND1_MEAN        : num  1.604 -0.298 1.122 1.155 0.481 ...
##  $ AA_Y_RND2_MEAN        : num  0.461 0.418 0.188 0.171 0.438 ...
##  $ AA_Y_RND1_SD          : num  0.0849 0.1229 0.0308 0.0486 0.0538 ...
##  $ AA_Y_RND2_SD          : num  0.1827 0.1685 0.0796 0.021 0.0484 ...
##  $ AA_Y_RND1_2_MEAN      : num  1.0327 0.0602 0.6551 0.6631 0.4594 ...
##  $ AA_Y_RND1_2_MED       : num  1.086 0.0597 0.6782 0.645 0.4497 ...
##  $ AA_Y_RND1_2_SD        : num  0.6389 0.4137 0.5143 0.5399 0.0514 ...
##  $ LPI_Y_RND1_R1         : num  1.538 -0.132 1.187 0.952 0.43 ...
##  $ LPI_Y_RND1_R2         : num  1.6563 -0.0818 1.1928 0.9034 0.4296 ...
##  $ LPI_Y_RND1_R3         : num  1.236 -0.135 1.155 1.031 0.485 ...
##  $ LPI_Y_RND2_R1         : num  0.553 0.363 0.128 0.175 0.331 ...
##  $ LPI_Y_RND2_R2         : num  0.4699 0.4621 0.1915 0.0684 0.3426 ...
##  $ LPI_Y_RND2_R3         : num  0.166 0.634 0.289 0.161 0.411 ...
##  $ LPI_Y_RND1_MEAN       : num  1.477 -0.116 1.178 0.962 0.448 ...
##  $ LPI_Y_RND2_MEAN       : num  0.396 0.486 0.203 0.135 0.361 ...
##  $ LPI_Y_RND1_SD         : num  0.2168 0.0299 0.0203 0.0644 0.0317 ...
##  $ LPI_Y_RND2_SD         : num  0.2035 0.1371 0.0813 0.0579 0.0432 ...
##  $ LPI_Y_RND1_2_MEAN     : num  0.937 0.185 0.691 0.548 0.405 ...
##  $ LPI_Y_RND1_2_MED      : num  0.894 0.141 0.722 0.539 0.42 ...
##  $ LPI_Y_RND1_2_SD       : num  0.621 0.3419 0.537 0.4564 0.0585 ...
##  $ CTRL_GT_MEAN_RND1_2_SD: num  0.08023 0.00885 0.02301 0.00333 0.00483 ...
##  $ AA_GT_MEAN_RND1_2_SD  : num  0.3179 0.146 0.2502 0.1537 0.0511 ...
##  $ LPI_GT_MEAN_RND1_2_SD : num  0.3981 0.1371 0.2272 0.1504 0.0463 ...
##  $ P_value_M1            : num  0.3346 0.1987 0.3234 0.228 0.0754 ...
##  $ P.adjusted_M1         : num  0.91 0.91 0.91 0.91 0.91 ...
##  $ P_value_RND1_M2       : num  5.58e-03 2.47e-04 9.67e-06 2.50e-03 3.55e-05 ...
##  $ P_value_RND2_M2       : num  5.25e-17 6.93e-04 1.15e-01 1.47e-02 7.13e-03 ...
##  $ P.adjusted_RND1_M2    : num  0.02164 0.002034 0.000126 0.01246 0.000397 ...
##  $ P.adjusted_RND2_M2    : num  2.24e-14 1.12e-02 3.06e-01 9.49e-02 5.98e-02 ...
##  $ CTRL_GT_Mean_all      : num  0.0505 -0.0391 0.0188 0.0285 -0.0051 ...
##  $ n_CTRL                : int  6 6 6 6 6 6 6 6 6 6 ...
##  $ LPI_GT_Mean_all       : num  -0.463 -0.278 -0.267 -0.262 -0.253 ...
##  $ n_LPI                 : int  6 6 6 6 6 6 6 6 6 6 ...
##   [list output truncated]

EXTRACT ALL CRISPRi TARGET GENES THAT INDUCED ACETIC ACID TOLERANCE

  • Extract description of all genes (1617 genes) involved in this study from Saccharomyces Genome Database (SGD). A .csv file for this purpose is already exist in the COMPILED_DATA folder

Gene Description Key file :Gene_List_CRISPRi_lib.csv

whole_Gene_list_Final <- read.csv("COMPILED_DATA/Gene_List_CRISPRi_lib.csv", na.strings = "", stringsAsFactors = FALSE)
rownames(whole_Gene_list_Final) <- whole_Gene_list_Final$LIB_ID
  • Next, prepare a data.frame with descriptions of CRISPRi target genes that induced acetic acid tolerance. This file also include how many gRNAs per target gene induced the acetic acid tolerance.
Fit_all_M3 <- data.frame(sort(table(Analysis_Final_3$GENE[candidate_padj_0.1_FIT_M3]), decreasing = TRUE))
y <- as.character(Fit_all_M3$Var1)
x <- whole_Gene_list_Final[y, ]
Fit_all_M3_description <- cbind(Fit_all_M3, x[, -1])
str(Fit_all_M3_description)
## 'data.frame':    370 obs. of  8 variables:
##  $ Var1       : Factor w/ 370 levels "PEP7","RPN9",..: 1 2 3 4 5 6 7 8 9 10 ...
##  $ Freq       : int  5 5 5 4 3 3 3 3 3 3 ...
##  $ SGD_DB_ID  : chr  "S000002731" "S000002835" "S000001899" "S000006069" ...
##  $ SYS_ID     : chr  "YDR323C" "YDR427W" "YFR003C" "YPL148C" ...
##  $ GENE_SYM   : chr  "PEP7" "RPN9" "YPI1" "PPT2" ...
##  $ NAME       : chr  "carboxyPEPtidase Y-deficient" "Regulatory Particle Non-ATPase" "Yeast Phosphatase Inhibitor" "Phosphopantetheine:Protein Transferase" ...
##  $ PHENOTYPE  : chr  NA "Non-essential gene; null mutant is sensitive to elevated temperatures, shows cell cycle arrest in metaphase, an"| __truncated__ NA NA ...
##  $ DESCRIPTION: chr  "Adaptor protein involved in vesicle-mediated vacuolar protein sorting; multivalent adaptor protein; facilitates"| __truncated__ "Non-ATPase regulatory subunit of the 26S proteasome; similar to putative proteasomal subunits in other species;"| __truncated__ "Regulatory subunit of the type I protein phosphatase (PP1) Glc7p; Glc7p participates in the regulation of a var"| __truncated__ "Phosphopantetheine:protein transferase (PPTase); activates mitochondrial acyl carrier protein (Acp1p) by phosph"| __truncated__ ...
nrow(Fit_all_M3_description)
## [1] 370

This gives 370 CRISPRi target genes that induced acetic acid TOLERANCE

EXTRACT THE ACETIC ACID SENSITIVE STRAINS

  • First identify strains that grew well in Basal condition but did not grow or less than three (out of six) replicates managed to grow under acetic acid stress. We will call these strains as SUPER SENSITIVE. P-value estimation for these strains were not possible or was not performed as n was ≤ 2.
super_sen_M3 <- Analysis_Final_3[which(!is.na(Analysis_Final_3$CTRL_GT_Mean_all)
                                       &(Analysis_Final_3$n_LPI<3)
                                       &(
                                         is.na(Analysis_Final_3$LPI_GT_Mean_all)
                                         |(Analysis_Final_3$LPI_GT_Mean_all> 0.165662)
                                       )
), ]
nrow(super_sen_M3)
## [1] 17

This gives 17 ACETIC ACID SUPER SENSITIVE strains

  • Next, extract the row index that satisfy the statistical (adjusted P-value ≤ 0.1) and effect size (mean LPI GT > 0.165662) criterion for acetic acid tolerant candidates
candidate_padj_0.1_SEN_M3 <- which((Analysis_Final_3$LPI_GT_Mean_all > 0.165662 & Analysis_Final_3$P.adjusted_M3<= 0.1))
length(candidate_padj_0.1_SEN_M3)
## [1] 481

This gives 481 ACETIC ACID SENSITIVE strains.

  • Extract the row data of acetic acid sensitive strains
Sen_M3_complete <- rbind(super_sen_M3, Analysis_Final_3[candidate_padj_0.1_SEN_M3, ])
Sen_M3_complete <- Sen_M3_complete[order(Sen_M3_complete$LPI_GT_Mean_all, decreasing = TRUE), ]
nrow(Sen_M3_complete)
## [1] 498

In TOTAL, 481+17 = 498 strains displayed acetic acid SENSITIVITY

EXTRACT ALL CRISPRi TARGET GENES THAT INDUCED ACETIC ACID SENSITIVITY

  • Prepare a data.frame with descriptions of CRISPRi target genes that induced acetic acid sensitivity. This file also include how many gRNAs per target gene induced the acetic acid sensitivity.
Sen_all_M3 <- data.frame(sort(table(c(Analysis_Final_3$GENE[candidate_padj_0.1_SEN_M3], super_sen_M3$GENE)), decreasing = TRUE))
y <- as.character(Sen_all_M3$Var1)
x <- whole_Gene_list_Final[y, ]
Sen_all_M3_description <- cbind(Sen_all_M3, x[, -1])
nrow(Sen_all_M3_description)
## [1] 367

This gives 367 CRISPRi target genes that induced acetic acid SENSITIVITY

GO ANALYSIS

Data preparation

Extracting the SGD_ID for the unique genes in Fit_all_M3 (see EXTRACT ALL CRISPRi TARGET GENES THAT INDUCED ACETIC ACID TOLERANCE)

Fit_unique_M3 <- as.character(Fit_all_M3$Var1)
x <- whole_Gene_list_Final[Fit_unique_M3, ]
Fit_unique_M3_SGD_ID <- x$SGD_DB_ID
str(Fit_unique_M3_SGD_ID)
##  chr [1:370] "S000002731" "S000002835" "S000001899" "S000006069" ...

Extracting the SGD_ID for the unique genes in Sen_all_M3 (see EXTRACT ALL CRISPRi TARGET GENES THAT INDUCED ACETIC ACID SENSITIVITY)

Sen_unique_M3 <- as.character(Sen_all_M3$Var1)
x <- whole_Gene_list_Final[Sen_unique_M3, ]
Sen_unique_M3_SGD_ID <- x$SGD_DB_ID
str(Sen_unique_M3_SGD_ID)
##  chr [1:367] "S000003191" "S000003105" "S000006015" "S000001283" ...

Perform GO analysis with the above gene identifier sets in Saccharomyces genome database link

DATA VISUALIZATION

Here we present the SCAN-O-MATIC data in graph and charts.

PREREQUISITE PACKAGES

INSTALL

  • ggplot2
  • reshape
  • pheatmap
  • wordcloud

GROWTH CURVES

Plot some representative growth curves form scan-o-matic.

The growth curve data was generated by running the flatten_curves_2.py script (obtained from Simon Stenberg, Gothenburg University, Sweden and available on request) in the scan-o-matic analysis folder generated within the project folder. The program will then generate a curves_flat.csv file in that analysis folder. For the representative growth curve, we generate this curves_flat.csv for the project that have the growth output of plate number 7 and 8 at Basal and acetic acid condition in the screening Round 1. The file is then renamed as Data_for_Representative_GC_SOM.csv and available in our COMPILED_DATA folder.

  • Import data
Growth_curve_data <- read.csv("COMPILED_DATA/Data_for_Representative_GC_SOM.csv", sep = "\t", header = TRUE)
str(Growth_curve_data)
## 'data.frame':    255 obs. of  6145 variables:
##  $ X       : int  0 1 2 3 4 5 6 7 8 9 ...
##  $ X0_0_0  : num  21659 21516 21142 20760 20348 ...
##  $ X0_0_1  : num  181676 183347 189180 197935 209654 ...
##  $ X0_0_2  : num  152110 153658 158302 165449 175688 ...
##  $ X0_0_3  : num  153383 155001 160064 167504 177861 ...
##  $ X0_0_4  : num  150199 152044 157029 164412 174582 ...
##  $ X0_0_5  : num  157769 159504 165478 173945 185550 ...
##  $ X0_0_6  : num  157492 159423 165380 173741 185011 ...
##  $ X0_0_7  : num  163998 165903 172040 180626 192484 ...
##  $ X0_0_8  : num  160719 162364 167986 176187 187369 ...
##  $ X0_0_9  : num  158611 160418 165957 173936 184680 ...
##  $ X0_0_10 : num  152766 154485 160248 168237 179351 ...
##  $ X0_0_11 : num  144377 146154 151480 159292 169859 ...
##  $ X0_0_12 : num  155710 157550 163646 172042 183956 ...
##  $ X0_0_13 : num  162645 164493 170546 179213 190696 ...
##  $ X0_0_14 : num  147321 149000 154870 163220 174498 ...
##  $ X0_0_15 : num  157033 158879 164907 173405 185125 ...
##  $ X0_0_16 : num  150163 152295 158376 167082 178636 ...
##  $ X0_0_17 : num  149176 151136 157494 166112 177624 ...
##  $ X0_0_18 : num  143700 145653 150989 158896 169646 ...
##  $ X0_0_19 : num  148948 150521 156191 164286 175496 ...
##  $ X0_0_20 : num  145147 147187 152912 161073 171831 ...
##  $ X0_0_21 : num  142052 143518 149019 156608 167246 ...
##  $ X0_0_22 : num  146214 148212 153862 161936 172927 ...
##  $ X0_0_23 : num  140830 142385 148023 155925 166785 ...
##  $ X0_0_24 : num  134855 136393 141402 148690 158693 ...
##  $ X0_0_25 : num  149492 151342 156745 164620 175547 ...
##  $ X0_0_26 : num  141515 143191 148426 155987 166452 ...
##  $ X0_0_27 : num  143168 145143 150564 158322 168999 ...
##  $ X0_0_28 : num  137947 139703 144873 152631 163225 ...
##  $ X0_0_29 : num  130303 131934 136797 143906 153586 ...
##  $ X0_0_30 : num  140933 142587 147920 155537 165961 ...
##  $ X0_0_31 : num  132977 134740 139798 147036 156743 ...
##  $ X0_0_32 : num  152496 154112 159580 167378 178165 ...
##  $ X0_0_33 : num  149542 151175 156140 163476 173447 ...
##  $ X0_0_34 : num  136258 137622 142480 149640 159691 ...
##  $ X0_0_35 : num  142896 144491 149719 157530 167855 ...
##  $ X0_0_36 : num  146818 148602 153988 161864 172516 ...
##  $ X0_0_37 : num  139913 141419 146601 153831 163993 ...
##  $ X0_0_38 : num  143241 145000 150354 158056 168380 ...
##  $ X0_0_39 : num  143393 145107 150475 158123 168609 ...
##  $ X0_0_40 : num  135160 136890 141943 149198 158987 ...
##  $ X0_0_41 : num  134769 136296 141351 148585 158703 ...
##  $ X0_0_42 : num  144394 145768 150783 157971 167928 ...
##  $ X0_0_43 : num  138399 140104 145184 152572 162749 ...
##  $ X0_0_44 : num  124673 126352 131224 138081 147141 ...
##  $ X0_0_45 : num  114587 115846 120020 125885 134350 ...
##  $ X0_0_46 : num  110348 111676 116271 122616 131179 ...
##  $ X0_0_47 : num  117456 119363 124589 131727 140869 ...
##  $ X0_1_0  : num  22810 23128 23870 24799 25829 ...
##  $ X0_1_1  : num  145029 147034 153453 161576 173377 ...
##  $ X0_1_2  : num  126534 128386 133739 141434 152168 ...
##  $ X0_1_3  : num  139343 141248 147201 155430 167194 ...
##  $ X0_1_4  : num  139499 140724 145191 151964 161708 ...
##  $ X0_1_5  : num  144551 145739 150218 157163 166766 ...
##  $ X0_1_6  : num  143214 144462 149041 155715 165060 ...
##  $ X0_1_7  : num  151872 153319 158240 165366 175326 ...
##  $ X0_1_8  : num  141396 143111 148785 156579 167330 ...
##  $ X0_1_9  : num  160825 162958 168962 177289 188668 ...
##  $ X0_1_10 : num  133036 134553 139455 146616 156589 ...
##  $ X0_1_11 : num  152913 154811 160957 169438 180925 ...
##  $ X0_1_12 : num  143398 144777 150134 157720 168074 ...
##  $ X0_1_13 : num  155655 156998 161956 169314 179452 ...
##  $ X0_1_14 : num  132847 133928 138343 144997 154503 ...
##  $ X0_1_15 : num  137801 139640 145204 153077 163597 ...
##  $ X0_1_16 : num  133666 135220 140380 147735 157941 ...
##  $ X0_1_17 : num  145284 146605 152138 160181 170900 ...
##  $ X0_1_18 : num  133894 135037 139321 146012 155380 ...
##  $ X0_1_19 : num  142898 144185 149398 156778 167194 ...
##  $ X0_1_20 : num  134633 136386 141744 149450 159842 ...
##  $ X0_1_21 : num  141251 142372 147568 154874 165134 ...
##  $ X0_1_22 : num  136226 138091 143321 150859 161098 ...
##  $ X0_1_23 : num  141314 142956 148132 155448 165359 ...
##  $ X0_1_24 : num  129189 130531 135453 142561 152295 ...
##  $ X0_1_25 : num  140901 142546 147494 154677 164453 ...
##  $ X0_1_26 : num  134427 135692 140321 147041 156181 ...
##  $ X0_1_27 : num  138449 140122 145260 152646 162672 ...
##  $ X0_1_28 : num  132034 133511 138861 146125 156225 ...
##  $ X0_1_29 : num  141238 143038 147991 155146 164859 ...
##  $ X0_1_30 : num  132174 133310 138048 144650 153980 ...
##  $ X0_1_31 : num  140381 141863 145986 152715 162049 ...
##  $ X0_1_32 : num  126915 128482 133721 140874 150809 ...
##  $ X0_1_33 : num  139928 141649 146794 154421 164627 ...
##  $ X0_1_34 : num  129303 130638 135409 142312 151965 ...
##  $ X0_1_35 : num  136006 137780 143515 151541 162142 ...
##  $ X0_1_36 : num  124382 126026 131319 138754 148893 ...
##  $ X0_1_37 : num  135698 137395 143070 150813 161206 ...
##  $ X0_1_38 : num  120918 122818 128164 135497 145464 ...
##  $ X0_1_39 : num  135871 137852 143947 152148 163011 ...
##  $ X0_1_40 : num  124747 126548 131563 138408 147764 ...
##  $ X0_1_41 : num  132137 133926 139524 147177 157657 ...
##  $ X0_1_42 : num  137212 138874 144363 152079 162437 ...
##  $ X0_1_43 : num  146906 148537 153929 161533 172109 ...
##  $ X0_1_44 : num  129890 131522 136845 144248 154354 ...
##  $ X0_1_45 : num  135719 137406 142752 150099 160076 ...
##  $ X0_1_46 : num  115419 116985 121873 128508 137603 ...
##  $ X0_1_47 : num  122226 123630 127756 133947 142124 ...
##  $ X0_2_0  : num  201430 203145 208703 217131 228514 ...
##  $ X0_2_1  : num  136834 138043 142287 148472 157794 ...
##   [list output truncated]
  • Read the data and prepare : Each scan-o-matic scanner can accommodate 4 plates. In this case the plates are arranged as below,

Plate0: Plate7_Basal Plate1: Plate8_Basal Plate2: Plate7_AceticAcid Plate3: Plate8_AceticAcid

Each plate have 1536 colonies i.e. 384 strains x 3 replicates + 384 spatial control.

The FIRST COLUMN is just the Image number and 0 being the first image.

Now there are 1536 * 4 = 6144 more columns after the first column. i.e. each colony data is a column. The naming format is as below;

X[Plate_number][row_number][column_number]

All numbers are starting from zero. Therefore, plate_numbers will be ranging from 0 to 3. Each 1536 plate has 32 rows and 48 column. Therefore the row numbers will be ranging from 0 to 31 and column numbers from 0 to 47.

Now we extract the data of only 4 strains from the entire dataset. i.e. one strain that displayed acetic acid sensitivity, a strain with slight acetic acid tolerance, and finally one control strain. The selected strains and the respective positions are obtained from the raw dataset whole_data_CRISPRi_aa

Strain Characteristics Strain name Plate Number Location1536 Colname Basal Colname acetic
Acetic acid Tolerant “POL2-NRg-1” Plate7 U4 X0_20_3 X2_20_3
Acetic acid sensitive “RRP15-TRg-4” Plate7 E4 X0_4_3 X2_4_3
Control strain1 “CC23” Plate8 AE23 X1_30_22 X3_30_22

Therefore extract the above columns data and also the first column with the image number and save it in a new variable. Then change the column names to the [gRNA]_[condition] format

Growth_curve_data_selected <- Growth_curve_data[, c("X", "X0_20_3", "X2_20_3", "X0_4_3", "X2_4_3", "X1_30_22", "X3_30_22")]
colnames(Growth_curve_data_selected) <- c("Time", "POL2-NRg-1_Basal", "POL2-NRg-1_Acetic", "RRP15-TRg-4_Basal", "RRP15-TRg-4_Acetic", "CC23_Basal", "CC23_Acetic")
str(Growth_curve_data_selected)
## 'data.frame':    255 obs. of  7 variables:
##  $ Time              : int  0 1 2 3 4 5 6 7 8 9 ...
##  $ POL2-NRg-1_Basal  : num  113652 114873 119049 124818 132739 ...
##  $ POL2-NRg-1_Acetic : num  85018 85018 85057 85317 85615 ...
##  $ RRP15-TRg-4_Basal : num  134222 135491 139949 146494 156017 ...
##  $ RRP15-TRg-4_Acetic: num  81686 81982 82500 83028 83565 ...
##  $ CC23_Basal        : num  125292 126829 131175 137350 145429 ...
##  $ CC23_Acetic       : num  96983 97229 97975 98813 99707 ...

Images are automatically taken 20 minutes apart. Therefore, image number*20/60 will give us the time point in hour. Therefore, We will convert the first column in time point.

Growth_curve_data_selected[, 1] <- Growth_curve_data_selected[, 1]*20/60

Convert the data.frame in long format and save in a new variable

library(reshape)
Growth_curve_data_selected_long <- reshape(data=Growth_curve_data_selected, idvar="Time",
                                     varying = colnames(Growth_curve_data_selected)[2:7],
                                     v.name=c("Population_size"),
                                     new.row.names = 1:30000,
                                     direction="long",
                                     timevar = "gRNA_condition",
                                     times = colnames(Growth_curve_data_selected)[2:7])
str(Growth_curve_data_selected_long)
## 'data.frame':    1530 obs. of  3 variables:
##  $ Time           : num  0 0.333 0.667 1 1.333 ...
##  $ gRNA_condition : chr  "POL2-NRg-1_Basal" "POL2-NRg-1_Basal" "POL2-NRg-1_Basal" "POL2-NRg-1_Basal" ...
##  $ Population_size: num  113652 114873 119049 124818 132739 ...
##  - attr(*, "reshapeLong")=List of 4
##   ..$ varying:List of 1
##   .. ..$ Population_size: chr [1:6] "POL2-NRg-1_Basal" "POL2-NRg-1_Acetic" "RRP15-TRg-4_Basal" "RRP15-TRg-4_Acetic" ...
##   .. ..- attr(*, "v.names")= chr "Population_size"
##   .. ..- attr(*, "times")= chr [1:6] "POL2-NRg-1_Basal" "POL2-NRg-1_Acetic" "RRP15-TRg-4_Basal" "RRP15-TRg-4_Acetic" ...
##   ..$ v.names: chr "Population_size"
##   ..$ idvar  : chr "Time"
##   ..$ timevar: chr "gRNA_condition"
  • Plot the graph
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
Figure 9 (Part of Fig. 1 in Manuscript): Representative growth curves

Figure 9 (Part of Fig. 1 in Manuscript): Representative growth curves

SCATTER PLOT : CORRELATION BETWEEN LPI GT MEAN ROUND 1 and LPI GT MEAN ROUND 2

Scatterplot to display reproducibility of the two scan-o-matic screenings. The mean of the three LPI_GT replicates of each strain is plotted against X and Y axis for round1 and round2, respectively. The data of the CRISPRi control strains are indicated with green dots, acetic acid sensitive strains are indicated with red dots and acetic acid tolerant strains are indicated with blue dots. Data of all other strains are indicated with black dots.

Figure 10 (Fig. 2A in Manuscript): DATA REPRODUCIBILITY

Figure 10 (Fig. 2A in Manuscript): DATA REPRODUCIBILITY

summary(stats_LPI_GT_Mean_RND1vsRND2_M3)
## 
## Call:
## lm(formula = LPI_GT_RND2_MEAN ~ LPI_GT_RND1_MEAN, data = Analysis_Final_3)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.71004 -0.04877 -0.00146  0.04439  1.05360 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      -0.009341   0.001014  -9.215   <2e-16 ***
## LPI_GT_RND1_MEAN  0.274544   0.005532  49.631   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.0886 on 8832 degrees of freedom
##   (244 observations deleted due to missingness)
## Multiple R-squared:  0.2181, Adjusted R-squared:  0.218 
## F-statistic:  2463 on 1 and 8832 DF,  p-value: < 2.2e-16
cor(Analysis_Final_3$LPI_GT_RND1_MEAN, 
    Analysis_Final_3$LPI_GT_RND2_MEAN,  
    method = "pearson", 
    use = "complete.obs")
## [1] 0.4669872

The linear regression fitting model (black dashed line) for the data of all strains together gave a co-efficient of determination i.e. R2 = 0.22 and Pearson correlation coefficient r = 0.47

summary(stats_LPI_GT_Mean_RND1vsRND2_M3_selected)
## 
## Call:
## lm(formula = LPI_GT_RND2_MEAN[c(candidate_padj_0.1_SEN_M3, candidate_padj_0.1_FIT_M3)] ~ 
##     LPI_GT_RND1_MEAN[c(candidate_padj_0.1_SEN_M3, candidate_padj_0.1_FIT_M3)], 
##     data = Analysis_Final_3)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.40299 -0.05268 -0.00899  0.03756  0.55375 
## 
## Coefficients:
##                                                                           Estimate
## (Intercept)                                                               0.000213
## LPI_GT_RND1_MEAN[c(candidate_padj_0.1_SEN_M3, candidate_padj_0.1_FIT_M3)] 0.581230
##                                                                           Std. Error
## (Intercept)                                                                 0.003588
## LPI_GT_RND1_MEAN[c(candidate_padj_0.1_SEN_M3, candidate_padj_0.1_FIT_M3)]   0.010154
##                                                                           t value
## (Intercept)                                                                 0.059
## LPI_GT_RND1_MEAN[c(candidate_padj_0.1_SEN_M3, candidate_padj_0.1_FIT_M3)]  57.240
##                                                                           Pr(>|t|)
## (Intercept)                                                                  0.953
## LPI_GT_RND1_MEAN[c(candidate_padj_0.1_SEN_M3, candidate_padj_0.1_FIT_M3)]   <2e-16
##                                                                              
## (Intercept)                                                                  
## LPI_GT_RND1_MEAN[c(candidate_padj_0.1_SEN_M3, candidate_padj_0.1_FIT_M3)] ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.09715 on 872 degrees of freedom
##   (85 observations deleted due to missingness)
## Multiple R-squared:  0.7898, Adjusted R-squared:  0.7896 
## F-statistic:  3276 on 1 and 872 DF,  p-value: < 2.2e-16
cor(Analysis_Final_3$LPI_GT_RND1_MEAN[c(candidate_padj_0.1_SEN_M3, candidate_padj_0.1_FIT_M3)], 
    Analysis_Final_3$LPI_GT_RND2_MEAN[c(candidate_padj_0.1_SEN_M3, candidate_padj_0.1_FIT_M3)],  
    method = "pearson", 
    use = "complete.obs")
## [1] 0.8887079

The linear regression fitting model (red dashed line) of the acetic acid sensitive and tolerant strain’s data gave a R2 value of 0.79 and Pearson correlation coefficient r= 0.89 .

SCATTER PLOT LSC GT (IN BASAL CONDITION) VS LPI_GT

Scatterplot showing the relative generation time of each CRISPRi strains in basal condition in X-axis [Log Strain Co-efficient (LSC) of generation time (GT)] and relative generation time under acetic acid stress condition (150mM Acetic acid) compared to control condition in Y-axis (LPI_GT). Each point indicates the mean of all the replicates (n=6). For some acetic acid sensitive strains (198), the number of replicates are between 3-5 (n=3 for 135; n=4 for 16; n=5 for 47) as not all replicates managed to grow on the acetic acid stress condition. The data of the CRISPRi control strains are indicated with the green dots. Based on our statistical analysis, strains that have FDR adjusted P-values ≤ 0.1 and mean LPI_GT > 0.165 (maximum LPI_GT of CRISPRi control strains) are designated as acetic acid sensitive strains ( represented by red dots). Strains that have FDR adjusted P-values ≤ 0.1 and mean LPI_GT < -0.037 (minimum LPI_GT of CRISPRi control strains) are designated as acetic acid tolerant(blue dots). The LPI_GT threshold is indicated with a gray dashed line. Data of strains that falls outside the adjusted P-value and LPI_GT threshold, are indicated with black dots.

Figure 11 (Fig. 2C in Manuscript): Normalized generation time (LSC GT) of strains in Basal condition vs Relative generation time (LPI GT) of strains in acetic acid condition compared to basal condition

Figure 11 (Fig. 2C in Manuscript): Normalized generation time (LSC GT) of strains in Basal condition vs Relative generation time (LPI GT) of strains in acetic acid condition compared to basal condition

VIOLIN PLOT

Violin-plots display the spread and the distribution of the LPI GT data for all CRISPRi strains (ALL), and LPI_GT values of CRISPRi control strains

  • Preparing a dataset for violin plot of LPI GTAll_strains, LPI GTControl_strains
Violin_LPI_Mean_M3 <- data.frame()
R <- length(which((Analysis_Final_3$Control.gRNA==0)
                  &(!is.na(Analysis_Final_3$LPI_GT_Mean_all))
))
Violin_LPI_Mean_M3[1:R, 1] <- Analysis_Final_3$LPI_GT_Mean_all[which((Analysis_Final_3$Control.gRNA==0)
                                                                     &(!is.na(Analysis_Final_3$LPI_GT_Mean_all)))]
Violin_LPI_Mean_M3[1:R, 2] <- "ALL"
R2 <- length(which(Analysis_Final_3$Control.gRNA==1))
Violin_LPI_Mean_M3[(R+1):(R+R2), 1] <- Analysis_Final_3$LPI_GT_Mean_all[which(Analysis_Final_3$Control.gRNA==1)]
Violin_LPI_Mean_M3[(R+1):(R+R2), 2] <- "CONTROL"
colnames(Violin_LPI_Mean_M3)[1:2] <- c("Mean", "Label")
Figure 12 (Fig. 2C INSET, in Manuscript): Violin-plots display the spread and the distribution of the LPI GT data

Figure 12 (Fig. 2C INSET, in Manuscript): Violin-plots display the spread and the distribution of the LPI GT data

WORDCLOUD

We display gene names that are highly represented within the fit and the sensitive strains, i.e. CRISPRi targeting of these genes by multiple gRNA displayed the tolerant / sensitive phenotype. The CRISPRi repression of a gene vs the obtained phenotype relationship is more reliable for those highly represented genes.

  • WORD CLOUD for the acetic acid TOLERANT strains
## Loading required package: RColorBrewer
Figure 13: Wordcloud for CRISPRi gene targets of acetic acid tolerant strains

Figure 13: Wordcloud for CRISPRi gene targets of acetic acid tolerant strains

  • WORD CLOUD for the acetic acid SENSITIVE strains
Figure 14: Wordcloud for CRISPRi gene targets of acetic acid sensitive strains

Figure 14: Wordcloud for CRISPRi gene targets of acetic acid sensitive strains

HISTOGRAM

NUMBER OF STRAINS/GENE AND gRNA DISTANCE FROM TSS

First assigning rownames as the gRNA names in the Analysis_Final_3 data.frame

row.names(Analysis_Final_3) <- Analysis_Final_3$gRNA_name

Next, for this graph we fetch some additional information from a .CSV file available as supplementary in smith et al., 2017. The file is also available in our COMPILED_DATA folder

Supplementary data from smith et al., 2017 : smith_YEPGdata.csv

Smith_Yepg_data <- read.csv("COMPILED_DATA/smith_YEPGdata.csv", na.strings = "")
str(Smith_Yepg_data)
## 'data.frame':    8939 obs. of  28 variables:
##  $ ORF                       : chr  "YBL074C" "YBL074C" "YBL074C" "YBL074C" ...
##  $ gRNA_targeting_seq        : chr  "CCAGCGATAAGGAGGATCTT" "TGTGTCCTTTCTTCATCTCT" "AAAAGGAAAAAGTAATTAGG" "GTGAAAAGGAAAAAGTAATT" ...
##  $ Midpoint_TSS_dist         : int  -36 -135 -187 -190 -33 -90 -139 -178 -11 -15 ...
##  $ Norm_atac_seq_read_density: num  0.56 0.93 0.62 0.62 0.54 0.55 0.53 0.41 0.28 0.28 ...
##  $ Multiple_ORFs_Targeted    : int  0 0 0 0 0 0 1 1 0 0 ...
##  $ nearby_genes              : chr  NA NA NA NA ...
##  $ gene_name                 : chr  "AAR2" "AAR2" "AAR2" "AAR2" ...
##  $ guide_id                  : chr  "AAR2-NRg-3" "AAR2-NRg-4" "AAR2-TRg-15" "AAR2-TRg-16" ...
##  $ oligo_seq                 : chr  "GGGAGCTGCGATTGGCAGCCAGCGATAAGGAGGATCTTGTTTTAGAGCTAGAAATAGCAAG" "GGGAGCTGCGATTGGCAGTGTGTCCTTTCTTCATCTCTGTTTTAGAGCTAGAAATAGCAAG" "GGGAGCTGCGATTGGCAGAAAAGGAAAAAGTAATTAGGGTTTTAGAGCTAGAAATAGCAAG" "GGGAGCTGCGATTGGCAGGTGAAAAGGAAAAAGTAATTGTTTTAGAGCTAGAAATAGCAAG" ...
##  $ YPEG_._ATC1               : num  16.8 963.5 193.5 153.9 78.3 ...
##  $ YPEG_._ATC2               : num  15 1010.4 208.2 136.5 46.2 ...
##  $ YPEG_._ATC3               : num  14.3 896.3 268.5 123.7 34.5 ...
##  $ YPEG4                     : num  55.9 815.5 447.3 276.6 133.9 ...
##  $ YPEG5                     : num  54 577 427 450 159 ...
##  $ YPEG6                     : num  60.3 722.8 311.5 312.5 91.6 ...
##  $ YPD_._ATC7                : num  28.3 687.2 349.5 258.9 111.6 ...
##  $ YPD_._ATC8                : num  29.1 560.2 376.3 199.9 88.7 ...
##  $ YPD_._ATC9                : num  54.4 816.7 338.7 227.5 124.8 ...
##  $ YPD10                     : num  11.5 639 604.1 405.4 121.7 ...
##  $ YPD11                     : num  46.7 688 528.5 380.2 141.8 ...
##  $ YPD12                     : num  57 809 479 213 153 ...
##  $ Pool                      : int  2 2 2 2 1 2 1 1 1 1 ...
##  $ Log2_YPEG                 : num  -1.884 0.441 -0.823 -1.327 -1.276 ...
##  $ Log2_YPD                  : num  -0.043 -0.0491 -0.598 -0.5412 -0.3587 ...
##  $ YPEG_filter_25            : int  1 1 1 1 1 1 1 0 1 1 ...
##  $ YPD_filter_25             : int  1 1 1 1 1 1 1 0 1 1 ...
##  $ ORF_Category              : chr  "Essential" "Essential" "Essential" "Essential" ...
##  $ RNA_structure.Kcal.Mol.   : num  -60.6 -56.8 -50.5 -55.3 -56.4 -52.7 -54.6 -57.7 -60.7 -54.1 ...

Out of several columns, the most useful for this study will be,

  • Column No: 3 i.e. Midpoint_TSS_dist
  • Column No: 4 i.e. Norm_atac_seq_read_density
  • Column No: 5 i.e. Multiple_ORFs_Targeted
  • Column No: 6 i.e. nearby_genes

Extract only this four column in the Analysis_Final_3 data.frame

for(i in 1:nrow(Analysis_Final_3)){
  x <- which(row.names(Analysis_Final_3)[i]==Smith_Yepg_data$guide_id)
  if(length(x)==0){
    Analysis_Final_3[i, 104:107] <- NA
  } else {
    Analysis_Final_3[i, 104:107] <- Smith_Yepg_data[x, 3:6]
  }
}
colnames(Analysis_Final_3)[104:107] <- colnames(Smith_Yepg_data)[3:6]
str((Analysis_Final_3)[104:107])
## 'data.frame':    9078 obs. of  4 variables:
##  $ Midpoint_TSS_dist         : int  -36 -135 -187 -190 -33 -90 -139 -178 -11 -15 ...
##  $ Norm_atac_seq_read_density: num  0.56 0.93 0.62 0.62 0.54 0.55 0.53 0.41 0.28 0.28 ...
##  $ Multiple_ORFs_Targeted    : int  0 0 0 0 0 0 1 1 0 0 ...
##  $ nearby_genes              : chr  NA NA NA NA ...

Estimate the gRNA frequency

gRNA_Freq <- data.frame(sort(table(Analysis_Final_3$GENE), decreasing = TRUE))

Plot the graphs

Figure 15: (Figure S7 in Manuscript) Histogram of number of strains per target gene in the CRISPRi library (TOP PANEL). Histogram of gRNA distance from Transcription starting site of the Genes (BOTTOM PANEL)

Figure 15: (Figure S7 in Manuscript) Histogram of number of strains per target gene in the CRISPRi library (TOP PANEL). Histogram of gRNA distance from Transcription starting site of the Genes (BOTTOM PANEL)

NORMALIZED GENERATION TIME (LSC GT) IN BASAL AND UNDER ACETIC ACID STRESS

First the LSC GT mean under acetic acid stress was recalculated for all the strains excluding the missing values

for(i in 1:nrow(Analysis_Final_3)){
  test1 <- t(Analysis_Final_3[i, 22:27])
  x1 <- sum(!is.na(test1[, 1]))
  AA_GT_Mean_temp <- mean(test1[which(!is.na(test1[, 1]))])
  Analysis_Final_3[i, 32] <- AA_GT_Mean_temp
}

Plot the histogram

Figure 16 (fig. 2B in manuscript): Histogram to display strains growth in Basal condition (TOP PANEL). Histogram to display strains growth at 150mM of acetic acid (BOTTOM PANEL)

Figure 16 (fig. 2B in manuscript): Histogram to display strains growth in Basal condition (TOP PANEL). Histogram to display strains growth at 150mM of acetic acid (BOTTOM PANEL)

ADDITIONAL INFO

  • Adding some extra information regarding the ORF category (Essential / Respiratory / Others) in the whole_Gene_list_Final dataframe from smith et al., 2017 dataset Smith_Yepg_data. This information can be used later to visualize the data
for(i in 1:nrow(whole_Gene_list_Final)){
  x <- as.character(unique(Smith_Yepg_data$ORF_Category[(Smith_Yepg_data$gene_name %in% whole_Gene_list_Final$LIB_ID[i])]))
  if(length(x)==0){
    whole_Gene_list_Final[i, 8] <- NA
  } else{
    whole_Gene_list_Final[i, 8] <- x
  }
}
colnames(whole_Gene_list_Final)[8] <- "ORF_Category"
whole_Gene_list_Final$ORF_Category <- as.factor(whole_Gene_list_Final$ORF_Category)
#Missing values were obtained from SGD
whole_Gene_list_Final$ORF_Category[which(is.na(whole_Gene_list_Final$ORF_Category))] <- c("Respiratory",
                                                                                          "Other", 
                                                                                          "Essential", 
                                                                                          "Essential", 
                                                                                          "Respiratory", 
                                                                                          "Other", 
                                                                                          "Respiratory", 
                                                                                          "Respiratory", 
                                                                                          "Other", 
                                                                                          "Respiratory", 
                                                                                          "Essential", 
                                                                                          "Essential", 
                                                                                          "Respiratory")
str(whole_Gene_list_Final)
## 'data.frame':    1617 obs. of  8 variables:
##  $ LIB_ID      : chr  "AAR2" "AAT1" "AAT2" "ABD1" ...
##  $ SGD_DB_ID   : chr  "S000000170" "S000001589" "S000004017" "S000000440" ...
##  $ SYS_ID      : chr  "YBL074C" "YKL106W" "YLR027C" "YBR236C" ...
##  $ GENE_SYM    : chr  "AAR2" "AAT1" "AAT2" "ABD1" ...
##  $ NAME        : chr  "A1-Alpha2 Repression" "Aspartate AminoTransferase" "Aspartate AminoTransferase" NA ...
##  $ PHENOTYPE   : chr  "Essential gene; conditional mutant is heat sensitive, loses viability at elevated temperature and displays elev"| __truncated__ "Non-essential gene; null mutant has a reduced respiratory growth rate and decreased competitive fitness on non-"| __truncated__ "Non-essential gene in S288C, but essential in the Sigma1278b background; S288C null mutant displays a decreased"| __truncated__ "Essential gene; temperature-sensitive mutation causes decreasing protein synthesis upon temperature shift; repr"| __truncated__ ...
##  $ DESCRIPTION : chr  "Component of the U5 snRNP complex; required for splicing of U3 precursors; originally described as a splicing f"| __truncated__ "Mitochondrial aspartate aminotransferase; catalyzes the conversion of oxaloacetate to aspartate in aspartate an"| __truncated__ "Cytosolic aspartate aminotransferase involved in nitrogen metabolism; localizes to peroxisomes in oleate-grown cells" "Methyltransferase; catalyzes the transfer of a methyl group from S-adenosylmethionine to the GpppN terminus of "| __truncated__ ...
##  $ ORF_Category: Factor w/ 3 levels "Essential","Other",..: 1 3 3 1 1 3 1 3 1 1 ...

BIOSCREEN LIQUID MICRO-CULTIVATION ANALYSIS

The results from Scan-o-matic phenomics were validated in liquid micro-cultivation growth experiment in bioscreen

ATC TITRATION DATA ANALYSIS

Some CRISPRi strains were selcetd for a liquid growth experiment in bioscreen to identify a ATc concentration that can induce similar growth inhibition in YNB liquid media (Basal condition) as we observed in our Quantitative spot test assay on YNB agar media with 7.5 ug/ml of ATc. Here we analyze that data set. These strains were selected based on the competitive growth assay of the CRISPRi library in liquid YPD medium with and without 250 ng/ml of ATc by (Smith et al., 2017).

DATA PREPARATION FOR ATC DOSAGE RESPONSE

  • Compiled Data Import: The ATc titration data is available in compiled form in the COMPILED_DATA folder

ATc titration data compiled : ATc_liq_titer_data.csv

Atc_liq_data <- read.csv("COMPILED_DATA/ATc_liq_titer_data.csv", na.strings = "NaN", header = TRUE)
str(Atc_liq_data)
## 'data.frame':    80 obs. of  9 variables:
##  $ Well_No          : int  9 19 29 39 49 59 69 79 89 99 ...
##  $ gRNA_name        : chr  "ACT1-NRg-5" "ACT1-NRg-5" "ACT1-NRg-5" "ACT1-NRg-5" ...
##  $ Atc_concentration: num  0 0.25 1 2 3 5 7.5 10 15 25 ...
##  $ Lag_R1           : num  5.95 5.81 5.89 2.87 3.95 ...
##  $ GT_R1            : num  1.96 2.03 2.16 3.9 5.06 ...
##  $ Yield_R1         : num  2.98 3.17 2.77 1.42 1.03 ...
##  $ Lag_R2           : num  5.53 5.69 5.49 3.62 3.97 ...
##  $ GT_R2            : num  2 2.02 2.17 4.59 4.88 ...
##  $ Yield_R2         : num  2.87 3.02 2.96 1.21 1.05 ...

Note that in liquid experiment, three phenotypes were estimated i.e. growth LAG phase, GENERATION TIME and growth biomass YIELD

  • Extract CRISPRi control strain (CC23) data
Atc_liq_cc23 <- Atc_liq_data[which(Atc_liq_data$gRNA_name=="Ctrl-CC23"), ]
  • Extract additional information such as the strain’s gRNA names and ATc concentrations used for titration
uniq_gRNA <- unique(Atc_liq_data$gRNA_name)
uniq_conc <- unique(Atc_liq_data$Atc_concentration)
  • Data transformation (log)
Atc_liq_data[, 10:15] <- log(Atc_liq_data[, 4:9])
  • Estimate Normalized growth (LSC) for LAG , GENERATION TIME and YIELD

To determine the normalized growth or Log Strain Co-efficient (LSC) values, we use the data of the control strain CC23. Substracting the log transformed growth phenotypes of CC23 from the log transformed phenotypes of the strains in the respective concentrations of ATc generates the log strain coefficients or LSC values at that condition for each phenotypes.

for(i in 1:length(uniq_gRNA)){
  for(j in 1:length(uniq_conc)){
    Atc_liq_data[which(Atc_liq_data$gRNA_name==uniq_gRNA[i]&
                         Atc_liq_data$Atc_concentration==uniq_conc[j]), 16:21] <- 
      Atc_liq_data[which(Atc_liq_data$gRNA_name==uniq_gRNA[i]&
                           Atc_liq_data$Atc_concentration==uniq_conc[j]), 10:15] - 
      Atc_liq_data[which(Atc_liq_data$gRNA_name=="Ctrl-CC23"&
                           Atc_liq_data$Atc_concentration==uniq_conc[j]), 10:15]
  }
}
  • Estimate the Mean and Standard deviation of the LSC values for each phenotype
for(i in 1:nrow(Atc_liq_data)){
  Atc_liq_data[i, 22] <- mean(as.numeric(Atc_liq_data[i, c(16, 19)][which(!is.na(Atc_liq_data[i, c(16, 19)]))]))
  Atc_liq_data[i, 23] <- sd(as.numeric(Atc_liq_data[i, c(16, 19)][which(!is.na(Atc_liq_data[i, c(16, 19)]))]))
  Atc_liq_data[i, 24] <- mean(as.numeric(Atc_liq_data[i, c(17, 20)][which(!is.na(Atc_liq_data[i, c(17, 20)]))]))
  Atc_liq_data[i, 25] <- sd(as.numeric(Atc_liq_data[i, c(17, 20)][which(!is.na(Atc_liq_data[i, c(17, 20)]))]))
  Atc_liq_data[i, 26] <- mean(as.numeric(Atc_liq_data[i, c(18, 21)][which(!is.na(Atc_liq_data[i, c(18, 21)]))]))
  Atc_liq_data[i, 27] <- sd(as.numeric(Atc_liq_data[i, c(18, 21)][which(!is.na(Atc_liq_data[i, c(18, 21)]))]))
}
  • Assign new column names
colnames(Atc_liq_data)[10:15] <- paste0("log_", colnames(Atc_liq_data)[4:9])
colnames(Atc_liq_data)[16:21] <- paste0("LSC_", colnames(Atc_liq_data)[4:9])
colnames(Atc_liq_data)[22:23] <- paste0(c("Mean_", "SD_"), "LSC_Lag")
colnames(Atc_liq_data)[24:25] <- paste0(c("Mean_", "SD_"), "LSC_GT")
colnames(Atc_liq_data)[26:27] <- paste0(c("Mean_", "SD_"), "LSC_Yield")

ATc DOSAGE RESPONSE VISUALIZATION BY SCATTER-PLOT

First making a subset to trim the dataset and include data for only the following gRNA_names ACT1-NRg-5, ACT1-NRg-8, SEC21-NRg-5, VPS1-TRg-1. These gRNA’s previously showed to induce strong CRISPRi mediated repression that ultimately caused lethality or very poor growth. These strains were also used for the ATc titration on YNB agar plates by Qualitative Spot-Test Assay. We also display the performance of another CRISPRi control strain Ctrl-CC11 just to display how it performed compared to other strains.

name_gRNA_atc_titer <- c("ACT1-NRg-5", "ACT1-NRg-8", "Ctrl-CC11", "SEC21-NRg-5", "VPS1-TRg-1")
test <- data.frame()
Atc_titer_subset <- data.frame()
for(i in 1:length(name_gRNA_atc_titer)){
  test <- Atc_liq_data[which(Atc_liq_data$gRNA_name==name_gRNA_atc_titer[i]), ]
  Atc_titer_subset <- rbind(Atc_titer_subset, test)
}
  • PLOT NORMALIZED LAG, GENERATION TIME AND YIELD
Figure 17 (fig. S8B in manuscript): Scatter plot to display ATC dosage response on Lag phase

Figure 17 (fig. S8B in manuscript): Scatter plot to display ATC dosage response on Lag phase

Figure 18 (fig. S8B in manuscript): Scatter plot to display ATC dosage response on Generation time

Figure 18 (fig. S8B in manuscript): Scatter plot to display ATC dosage response on Generation time

Figure 19 (fig. S8B in manuscript): Scatter plot to display ATC dosage response on Yield

Figure 19 (fig. S8B in manuscript): Scatter plot to display ATC dosage response on Yield

VALIDATION DATA ANALYSIS

In order to validate the acetic acid sensitivity or tolerance observed for the CRISPRi strains in the scan-o-matic screening, selected strains were grown in liquid YNB medium using the Bioscreen platform. The 48 most acetic acid sensitive (Initially we attempted to take the 50 most acetic acid sensitive strains but then two strains were eliminated due to their poor growth in basal condition) and 50 most tolerant CRISPRi strains from the scan-o-matic analysis were selected for the validation. Moreover, all CRISPRi strains with gRNAs targeting any of the following 12 genes:RPT4, RPN9, PRE4, MRPL10, MRPL4, SEC27, MIA40, VPS45, PUP3, VMA3, SEC62, COG1, were included making a total of 176 strains that were grown together with 7 control strains in liquid medium.

EXTRACTION OF VALIDATION STRAINS DATA FROM SCAN-O-MATIC DATA

  • First we obtain the list of the selected genes
select_genes <- read.table("COMPILED_DATA/selected_genes.txt", header = FALSE, sep = "\t", as.is = TRUE)
  • Extracting all CRISPRi strains that are targeting the genes in the list
y <- vector(mode = "numeric", length = 0)
for(i in 1:length(select_genes$V1)){
  x <- which(Analysis_Final_3$GENE==select_genes$V1[i])
  y <- c(y, x)
}
select_strains <- Analysis_Final_3$gRNA_name[y]
  • Getting the most 50 AA sensitive and 50 most AA Fit strains. For this purpose, we have already made a .CSV file with the data of the 50 most acetic acid tolerant and 50 most acetic acid sensitive strains. This file is also available in the COMPILED_DATA folder

50 Most tolerant and sensitive strains from Sacn-O-Matic : bottom_top_50.csv

bot_top_50 <- read.csv("COMPILED_DATA/bottom_top_50.csv", stringsAsFactors = FALSE, header = TRUE)
  • Eliminating any strains that grew very poorly in Basal medium. i.e. 3 or less number of replicates of it managed to grow at basal condition and the Mean LSC GT is 10% more than the control strain i.e greater than 0.1
bot_top_50 <- bot_top_50[-which(bot_top_50$n_CTRL<4 & bot_top_50$CTRL_GT_Mean_all > 0.1), ]
bot_top_50_strains <- bot_top_50$gRNA_name
  • Making a union of the two sets
validation_strains <- as.character(union(bot_top_50_strains, select_strains))

These strains were extracted from the main collection and were arrayed in two 96 well microtiter plate. The plate layout is available in the “/RAW_DATA/BS_VAL_SCR/STRAIN_MAP_VAL_EXP” folder

Plate layout : Plate_layout_Liquid_Growth_Exp.xlsx

The strains were grown in liquid YNB medium (basal condition) and in liquid YNB medium supplemented with 150mM (Experiment number 1-3) or 125mM (Experiment number 4-6) of acetic acid. For each strain, 3 independent replicates were included for each growth condition.

VALIDATION DATA IMPORT

The raw data of bioscreen runs are saved in the BS_VAL_SCR folder within the RAW_DATA folder. The raw files are organized in the following format 20200511_VAL_PLATE[microtiter plate number].[Experiment number]_CTRL_AA[acetic acid concentration in mM]_Trimmed

In each raw file the growth data of the strain in the basal condition is presented in 1-100 well and in the same strain order the growth data under acetic acid stress (at concentration as indicated in the raw file name) is presented in 101-200 wells. For each plate, the the wells can be linked to the strain names (gRNA names) using the Plate layout file as mentioned above in VALIDATION DATA ANALYSIS.

For the ease of analysis, the bioscreen raw data was compiled and saved as a .csv file in the COMPILED_DATA folder

bioscreen compiled data : Validation_Bioscreen_data.csv

  • Import bioscreen data
Val_data <- read.csv("COMPILED_DATA/Validation_Bioscreen_data.csv", na.strings = "NaN", header = TRUE)
str(Val_data)
## 'data.frame':    200 obs. of  40 variables:
##  $ Container.Name  : chr  "Well 1" "Well 2" "Well 3" "Well 4" ...
##  $ gRNA_name       : chr  "RSM28-TRg-6" "RGL1-NRg-7" "COG1-NRg-3" "COG1-TRg-1" ...
##  $ CTRL_GT_Mean_all: chr  "0.006053539" "-0.039136252" "-0.04627155" "-0.011279953" ...
##  $ LPI_GT_Mean_all : chr  "-0.191825576" "-0.278454226" "0.166542866" "0.490160515" ...
##  $ Ctrl_Lag_R1     : num  2.27 2.09 2.2 2.24 2.13 ...
##  $ Ctrl_GT_R1      : num  2.65 2.57 2.61 2.7 2.59 ...
##  $ Ctrl_Yield_R1   : num  1.84 2.04 1.97 2 1.97 ...
##  $ AA150_Lag_R1    : num  54.5 40.8 47 47.1 46.7 ...
##  $ AA150_GT_R1     : num  20 18.3 17.8 27.2 14.6 ...
##  $ AA150_Yield_R1  : num  0.225 0.25 0.294 0.208 0.475 ...
##  $ Ctrl_Lag_R2     : num  2.09 1.96 2.03 2.1 1.97 ...
##  $ Ctrl_GT_R2      : num  2.94 2.53 2.58 2.68 2.57 ...
##  $ Ctrl_Yield_R2   : num  1.86 2.24 2.1 1.96 2.06 ...
##  $ AA150_Lag_R2    : num  38.9 34.7 36.5 35.7 34.2 ...
##  $ AA150_GT_R2     : num  18.7 10.4 15.7 23.6 18.8 ...
##  $ AA150_Yield_R2  : num  0.246 0.654 0.408 0.26 0.4 ...
##  $ Ctrl_Lag_R3     : num  2.33 2.36 2.25 2.35 2.19 ...
##  $ Ctrl_GT_R3      : num  2.8 2.74 2.66 2.75 2.5 ...
##  $ Ctrl_Yield_R3   : num  2.04 2.06 2.19 2.24 2.61 ...
##  $ AA150_Lag_R3    : num  60 49.9 54.3 57 49.7 ...
##  $ AA150_GT_R3     : num  29.2 10.8 12.6 35.7 14.3 ...
##  $ AA150_Yield_R3  : num  0.0891 0.401 0.3597 0.0628 0.5105 ...
##  $ Ctrl_Lag_R4     : num  2.28 2.25 2.19 2.25 2.12 ...
##  $ Ctrl_GT_R4      : num  2.92 2.75 2.75 2.89 2.6 ...
##  $ Ctrl_Yield_R4   : num  2.21 2.03 1.94 1.76 2.38 ...
##  $ AA125_Lag_R4    : num  20.2 18.7 21.6 24.8 20.2 ...
##  $ AA125_GT_R4     : num  8.83 6.23 7.4 8.65 6.63 ...
##  $ AA125_Yield_R4  : num  0.57 1.016 0.729 0.869 1.028 ...
##  $ Ctrl_Lag_R5     : num  2.46 2.39 2.52 2.6 2.68 ...
##  $ Ctrl_GT_R5      : num  2.66 2.56 2.75 2.8 2.72 ...
##  $ Ctrl_Yield_R5   : num  2.2 2.42 1.86 2.02 1.9 ...
##  $ AA125_Lag_R5    : num  18.6 16.2 18.7 18.3 17.2 ...
##  $ AA125_GT_R5     : num  5.28 4.81 5.05 6.33 5.16 ...
##  $ AA125_Yield_R5  : num  1.74 1.9 1.75 1.73 1.85 ...
##  $ Ctrl_Lag_R6     : num  2.18 2.21 2.15 2.19 2.2 ...
##  $ Ctrl_GT_R6      : num  2.78 2.77 2.73 2.79 2.65 ...
##  $ Ctrl_Yield_R6   : num  2.27 2.07 2.07 2.23 2.26 ...
##  $ AA125_Lag_R6    : num  20.7 18.2 20.9 22.3 20.3 ...
##  $ AA125_GT_R6     : num  5.92 5.36 5.7 7.13 5.48 ...
##  $ AA125_Yield_R6  : num  1.4 1.51 1.5 1.33 1.58 ...

The first column of the dataset Val_data is Container.Name which is ranging from “Well 1” to “Well 200”. Therefore, the first 100 rows (Well 1 to well 100) display the data of strains from the microtiter plate 1 and the next 100 wells (101 - 200) are data of strains from the microtiter plate 2

  • Transformation of the extracted phenotype in natural logarithm
Val_data[, 41:76] <- log(Val_data[, 5:40])
colnames(Val_data)[41:76] <- paste0("log_", colnames(Val_data)[5:40])

ESTIMATION OF LSC VALUES FOR VALIDATION DATA

We estimated the LSC values for the scan-o-matic data using CC23 strain. Therefore, Here in this bioscreen experiment we employ the same strategy to estimate the LSC values. Now CC23 was in each of the plate (plate 1 and 2) and the replicates (three replicates each for AA conditions and 6 replicates for basal/Ctrl condition). However, it failed to grow in one of the replicate of 150mM of AA in Plate 2. Therefore, in order to have a relative estimate (LSC) we first make an average of the CC23 response for each plate and for each condition and then subtract this response from the response of each strain in that respective plate and condition. This will give a relative estimate or the Logarithmic strain co-efficient for each strain.

  • Extracting the data of CC23 control strain
Val_data_cc <- Val_data[which(Val_data$gRNA_name=="CC23"), ]
  • Estimating the mean
for(j in 1:2){
  #Ctrl_Lag
  Val_data_cc[j, 77] <- mean(as.numeric(Val_data_cc[j, c(41, 47, 53, 59, 65, 71)][which(!is.na(Val_data_cc[j, c(41, 47, 53, 59, 65, 71)]))]))
  #Ctrl_GT
  Val_data_cc[j, 78] <- mean(as.numeric(Val_data_cc[j, c(42, 48, 54, 60, 66, 72)][which(!is.na(Val_data_cc[j, c(42, 48, 54, 60, 66, 72)]))]))
  #Ctrl_Yield
  Val_data_cc[j, 79] <- mean(as.numeric(Val_data_cc[j, c(43, 49, 55, 61, 67, 73)][which(!is.na(Val_data_cc[j, c(43, 49, 55, 61, 67, 73)]))]))
  #AA150_Lag
  Val_data_cc[j, 80] <- mean(as.numeric(Val_data_cc[j, c(44, 50, 56)][which(!is.na(Val_data_cc[j, c(44, 50, 56)]))]))
  #AA150_GT
  Val_data_cc[j, 81] <- mean(as.numeric(Val_data_cc[j, c(45, 51, 57)][which(!is.na(Val_data_cc[j, c(45, 51, 57)]))]))
  #AA150_Yield
  Val_data_cc[j, 82] <- mean(as.numeric(Val_data_cc[j, c(46, 52, 58)][which(!is.na(Val_data_cc[j, c(46, 52, 58)]))]))
  #AA125_Lag
  Val_data_cc[j, 83] <- mean(as.numeric(Val_data_cc[j, c(62, 68, 74)][which(!is.na(Val_data_cc[j, c(62, 68, 74)]))]))
  #AA125_GT
  Val_data_cc[j, 84] <- mean(as.numeric(Val_data_cc[j, c(63, 69, 75)][which(!is.na(Val_data_cc[j, c(63, 69, 75)]))]))
  #AA125_GT
  Val_data_cc[j, 85] <- mean(as.numeric(Val_data_cc[j, c(64, 70, 76)][which(!is.na(Val_data_cc[j, c(64, 70, 76)]))]))
}
colnames(Val_data_cc)[77:79] <- paste0("Mean_Ctrl_", c("Lag", "GT", "Yield"))
colnames(Val_data_cc)[80:82] <- paste0("Mean_AA150_", c("Lag", "GT", "Yield"))
colnames(Val_data_cc)[83:85] <- paste0("Mean_AA125_", c("Lag", "GT", "Yield"))

Now we use CC23 response to calculate LSC values. The mean of the log transformed phenotypes (as calculated above) was determined for plate1 and plate2 and then was deducted from the respective phenotypic response of each strain

i.e. log_Phenotype_Strain - mean_log_Phenotype_CC23

The first 100 rows in Val_data is from plate 1. Therefore, we deduct the mean_log_Phenotype_CC23_Plate1 from this set.

  • Subtraction from plate 1 for Basal and AA150
#Replicate1
for(i in 1:100){
  Val_data[i, 77:82] <- Val_data[i, 41:46]-Val_data_cc["10", 77:82]
}
#Replicate2
for(i in 1:100){
  Val_data[i, 83:88] <- Val_data[i, 47:52]-Val_data_cc["10", 77:82]
}
#Replicate3
for(i in 1:100){
  Val_data[i, 89:94] <- Val_data[i, 53:58]-Val_data_cc["10", 77:82]
}
  • Substration from plate 2 for Ctrl and AA150
#Replicate1
for(i in 101:200){
  Val_data[i, 77:82] <- Val_data[i, 41:46]-Val_data_cc["110", 77:82]
}
#Replicate2
for(i in 101:200){
  Val_data[i, 83:88] <- Val_data[i, 47:52]-Val_data_cc["110", 77:82]
}
#Replicate3
for(i in 101:200){
  Val_data[i, 89:94] <- Val_data[i, 53:58]-Val_data_cc["110", 77:82]
}
  • Substration from plate 1 for Ctrl and AA125
#Replicate1
for(i in 1:100){
  Val_data[i, 95:100] <- Val_data[i, 59:64]-Val_data_cc["10", c(77:79, 83:85)]
}
#Replicate2
for(i in 1:100){
  Val_data[i, 101:106] <- Val_data[i, 65:70]-Val_data_cc["10", c(77:79, 83:85)]
}
#Replicate3
for(i in 1:100){
  Val_data[i, 107:112] <- Val_data[i, 71:76]-Val_data_cc["10", c(77:79, 83:85)]
}
  • Substration from plate 2 for Ctrl and AA125
#Replicate1
for(i in 101:200){
  Val_data[i, 95:100] <- Val_data[i, 59:64]-Val_data_cc["110", c(77:79, 83:85)]
}
#Replicate2
for(i in 101:200){
  Val_data[i, 101:106] <- Val_data[i, 65:70]-Val_data_cc["110", c(77:79, 83:85)]
}
#Replicate3
for(i in 101:200){
  Val_data[i, 107:112] <- Val_data[i, 71:76]-Val_data_cc["110", c(77:79, 83:85)]
}
  • Setting new column names
colnames(Val_data)[77:112] <- paste0("LSC_", colnames(Val_data)[5:40])

ESTIMATION OF LPI VALUES FOR VALIDATION DATA

  • We estimate the LPI by subtracting the LSC_CTRl values from the respective LSC acetic acid response

i.e. for example for replicate 1 LPI_AA150 = LSC_AA150_R1- LSC_CTRL_R1

#Replicate1_LPI_AA150
Val_data[, 113:115] <- Val_data[, 80:82]-Val_data[, 77:79]
colnames(Val_data)[113:115] <- paste0("LPI_", colnames(Val_data)[8:10])
#Replicate2_LPI_AA150
Val_data[, 116:118] <- Val_data[, 86:88]-Val_data[, 83:85]
colnames(Val_data)[116:118] <- paste0("LPI_", colnames(Val_data)[14:16])
#Replicate3_LPI_AA150
Val_data[, 119:121] <- Val_data[, 92:94]-Val_data[, 89:91]
colnames(Val_data)[119:121] <- paste0("LPI_", colnames(Val_data)[20:22])
#Replicate1_LPI_AA125
Val_data[, 122:124] <- Val_data[, 98:100]-Val_data[, 95:97]
colnames(Val_data)[122:124] <- paste0("LPI_", colnames(Val_data)[26:28])
#Replicate2_LPI_AA125
Val_data[, 125:127] <- Val_data[, 104:106]-Val_data[, 101:103]
colnames(Val_data)[125:127] <- paste0("LPI_", colnames(Val_data)[32:34])
#Replicate3_LPI_AA125
Val_data[, 128:130] <- Val_data[, 110:112]-Val_data[, 107:109]
colnames(Val_data)[128:130] <- paste0("LPI_", colnames(Val_data)[38:40])

ESTIMATION OF MEAN AND SD STATISTICS OF LPI VALUES FOR VALIDATION DATA

Now estimating the mean and the standard deviation (sd) of the LPI response for each strains. Some strains did not managed to grow in all three replicates. For those strains mean of the replicates that managed to grow are calculated and sd was estimated. Evidently, when number of replicate for a particular condition is 1, then sd is NA. A separate column was introduced to estimate how many non NA replicates were obtained for each strain in each condition.

for (i in 1:nrow(Val_data)){
  #LPI_AA150_Lag
  Val_data[i, 131] <- mean(as.numeric(Val_data[i, c(113, 116, 119)][which(!is.na(Val_data[i, c(113, 116, 119)]))]))
  Val_data[i, 132] <- sd(as.numeric(Val_data[i, c(113, 116, 119)][which(!is.na(Val_data[i, c(113, 116, 119)]))]))
  Val_data[i, 133] <- length(as.numeric(Val_data[i, c(113, 116, 119)][which(!is.na(Val_data[i, c(113, 116, 119)]))]))
  #LPI_AA150_GT
  Val_data[i, 134] <- mean(as.numeric(Val_data[i, c(114, 117, 120)][which(!is.na(Val_data[i, c(114, 117, 120)]))]))
  Val_data[i, 135] <- sd(as.numeric(Val_data[i, c(114, 117, 120)][which(!is.na(Val_data[i, c(114, 117, 120)]))]))
  Val_data[i, 136] <- length(as.numeric(Val_data[i, c(114, 117, 120)][which(!is.na(Val_data[i, c(114, 117, 120)]))]))
  #LPI_AA150_Yield
  Val_data[i, 137] <- mean(as.numeric(Val_data[i, c(115, 118, 121)][which(!is.na(Val_data[i, c(115, 118, 121)]))]))
  Val_data[i, 138] <- sd(as.numeric(Val_data[i, c(115, 118, 121)][which(!is.na(Val_data[i, c(115, 118, 121)]))]))
  Val_data[i, 139] <- length(as.numeric(Val_data[i, c(115, 118, 121)][which(!is.na(Val_data[i, c(115, 118, 121)]))]))
  #LPI_AA125_Lag
  Val_data[i, 140] <- mean(as.numeric(Val_data[i, c(122, 125, 128)][which(!is.na(Val_data[i, c(122, 125, 128)]))]))
  Val_data[i, 141] <- sd(as.numeric(Val_data[i, c(122, 125, 128)][which(!is.na(Val_data[i, c(122, 125, 128)]))]))
  Val_data[i, 142] <- length(as.numeric(Val_data[i, c(122, 125, 128)][which(!is.na(Val_data[i, c(122, 125, 128)]))]))
  #LPI_AA125_GT
  Val_data[i, 143] <- mean(as.numeric(Val_data[i, c(123, 126, 129)][which(!is.na(Val_data[i, c(123, 126, 129)]))]))
  Val_data[i, 144] <- sd(as.numeric(Val_data[i, c(123, 126, 129)][which(!is.na(Val_data[i, c(123, 126, 129)]))]))
  Val_data[i, 145] <- length(as.numeric(Val_data[i, c(123, 126, 129)][which(!is.na(Val_data[i, c(123, 126, 129)]))]))
  #LPI_AA125_GT
  Val_data[i, 146] <- mean(as.numeric(Val_data[i, c(124, 127, 130)][which(!is.na(Val_data[i, c(124, 127, 130)]))]))
  Val_data[i, 147] <- sd(as.numeric(Val_data[i, c(124, 127, 130)][which(!is.na(Val_data[i, c(124, 127, 130)]))]))
  Val_data[i, 148] <- length(as.numeric(Val_data[i, c(124, 127, 130)][which(!is.na(Val_data[i, c(124, 127, 130)]))]))
}
#Assigning the column names
colnames(Val_data)[131:133] <- paste0(c("Mean_", "SD_", "N_"), "LPI_AA150_Lag")
colnames(Val_data)[134:136] <- paste0(c("Mean_", "SD_", "N_"), "LPI_AA150_GT")
colnames(Val_data)[137:139] <- paste0(c("Mean_", "SD_", "N_"), "LPI_AA150_Yield")
colnames(Val_data)[140:142] <- paste0(c("Mean_", "SD_", "N_"), "LPI_AA125_Lag")
colnames(Val_data)[143:145] <- paste0(c("Mean_", "SD_", "N_"), "LPI_AA125_GT")
colnames(Val_data)[146:148] <- paste0(c("Mean_", "SD_", "N_"), "LPI_AA125_Yield")

STREAMLINING THE VALIDATION DATASET

The dataset Val_data until now have 200 rows that include data of 176 selected strains, 7 control strains and blank wells. The 7 control strains were also in both microtiter plates. Therefore, for each control strain, six replicates exists. In this part, we eliminate the blank rows, extract the control strain data and generate a mean data row for each of the control strains and then finally, add it to the dataframe with 176 selected strains.

  • Extracting the data of all control strains
dCtrl_strains <- c("CC23", "CC14", "CC2", "CC28", "CC30", "CC32", "CC34")
dCTRL_rows <- vector(mode = "integer", length = 0)
test <- data.frame()
Val_data_dCTRL <- data.frame()
for(i in 1:length(dCtrl_strains)){
  test <- Val_data[which(Val_data$gRNA_name==dCtrl_strains[i]), ]
  dCTRL_rows <- c(dCTRL_rows, which(Val_data$gRNA_name==dCtrl_strains[i]))
  Val_data_dCTRL <- rbind(Val_data_dCTRL, test)
}
  • Now the dCTRL strains were in both plates. Therefore, we create a new data frame and extract only the required columns to perform the analysis. Also we calculate the mean (of all six replicates) and standard deviation statistics.
m1 <- vector(mode = "numeric", length = 0)
m2 <- vector(mode = "numeric", length = 0)
test1 <- data.frame()
Val_data_dCTRL_F <- data.frame()
Val_data_dCTRL_F[1:7, 1:3] <- Val_data_dCTRL[c("10", "19", "28", "46", "55", "64", "73"), 2:4]
for (i in 1:length(dCtrl_strains)){
  test1 <- Val_data_dCTRL[which(Val_data_dCTRL$gRNA_name==dCtrl_strains[i]), ]
  #LPI_AA150_Lag
  m1 <- as.numeric(test1[1, c(113, 116, 119)][which(!is.na(test1[1, c(113, 116, 119)]))])
  m2 <- as.numeric(test1[2, c(113, 116, 119)][which(!is.na(test1[2, c(113, 116, 119)]))])
  Val_data_dCTRL_F[i, 4] <- mean(c(m1, m2))
  Val_data_dCTRL_F[i, 5] <- sd(c(m1, m2))
  Val_data_dCTRL_F[i, 6] <- length(c(m1, m2))
  #LPI_AA150_GT
  m1 <- as.numeric(test1[1, c(114, 117, 120)][which(!is.na(test1[1, c(114, 117, 120)]))])
  m2 <- as.numeric(test1[2, c(114, 117, 120)][which(!is.na(test1[2, c(114, 117, 120)]))])
  Val_data_dCTRL_F[i, 7] <- mean(c(m1, m2))
  Val_data_dCTRL_F[i, 8] <- sd(c(m1, m2))
  Val_data_dCTRL_F[i, 9] <- length(c(m1, m2))
  #LPI_AA150_Yield
  m1 <- as.numeric(test1[1, c(115, 118, 121)][which(!is.na(test1[1, c(115, 118, 121)]))])
  m2 <- as.numeric(test1[2, c(115, 118, 121)][which(!is.na(test1[2, c(115, 118, 121)]))])
  Val_data_dCTRL_F[i, 10] <- mean(c(m1, m2))
  Val_data_dCTRL_F[i, 11] <- sd(c(m1, m2))
  Val_data_dCTRL_F[i, 12] <- length(c(m1, m2))
  #LPI_AA125_Lag
  m1 <- as.numeric(test1[1, c(122, 125, 128)][which(!is.na(test1[1, c(122, 125, 128)]))])
  m2 <- as.numeric(test1[2, c(122, 125, 128)][which(!is.na(test1[2, c(122, 125, 128)]))])
  Val_data_dCTRL_F[i, 13] <- mean(c(m1, m2))
  Val_data_dCTRL_F[i, 14] <- sd(c(m1, m2))
  Val_data_dCTRL_F[i, 15] <- length(c(m1, m2))
  #LPI_AA125_GT
  m1 <- as.numeric(test1[1, c(123, 126, 129)][which(!is.na(test1[1, c(123, 126, 129)]))])
  m2 <- as.numeric(test1[2, c(123, 126, 129)][which(!is.na(test1[2, c(123, 126, 129)]))])
  Val_data_dCTRL_F[i, 16] <- mean(c(m1, m2))
  Val_data_dCTRL_F[i, 17] <- sd(c(m1, m2))
  Val_data_dCTRL_F[i, 18] <- length(c(m1, m2))
  #LPI_AA125_GT
  m1 <- as.numeric(test1[1, c(124, 127, 130)][which(!is.na(test1[1, c(124, 127, 130)]))])
  m2 <- as.numeric(test1[2, c(124, 127, 130)][which(!is.na(test1[2, c(124, 127, 130)]))])
  Val_data_dCTRL_F[i, 19] <- mean(c(m1, m2))
  Val_data_dCTRL_F[i, 20] <- sd(c(m1, m2))
  Val_data_dCTRL_F[i, 21] <- length(c(m1, m2))
}
colnames(Val_data_dCTRL_F)[4:6] <- paste0(c("Mean_", "SD_", "N_"), "LPI_AA150_Lag")
colnames(Val_data_dCTRL_F)[7:9] <- paste0(c("Mean_", "SD_", "N_"), "LPI_AA150_GT")
colnames(Val_data_dCTRL_F)[10:12] <- paste0(c("Mean_", "SD_", "N_"), "LPI_AA150_Yield")
colnames(Val_data_dCTRL_F)[13:15] <- paste0(c("Mean_", "SD_", "N_"), "LPI_AA125_Lag")
colnames(Val_data_dCTRL_F)[16:18] <- paste0(c("Mean_", "SD_", "N_"), "LPI_AA125_GT")
colnames(Val_data_dCTRL_F)[19:21] <- paste0(c("Mean_", "SD_", "N_"), "LPI_AA125_Yield")

We will add this recalculated Mean (from 6 independent replicates), sd, and N (number of replicates managed to grow) for all control strains to the final dataset

  • Removing the control strain data from the Val_data dataset.
Val_data_curated <- Val_data[-dCTRL_rows, ]
  • Removing also the Blank rows from the data
Val_data_curated <- Val_data_curated[-which(Val_data_curated$gRNA_name=="BLANK"), ]
  • Trimming all the non-essential column for easy data-handling
Val_data_column_trimmed <- Val_data_curated[, c(2:4, 131:148)]
  • Binding the column trimmed dataset to the control strain dataset to generate the working data.frame
Validation_LPI_all <- rbind(Val_data_column_trimmed, Val_data_dCTRL_F)
rownames(Validation_LPI_all) <- Validation_LPI_all$gRNA_name
str(Validation_LPI_all)
## 'data.frame':    183 obs. of  21 variables:
##  $ gRNA_name           : chr  "RSM28-TRg-6" "RGL1-NRg-7" "COG1-NRg-3" "COG1-TRg-1" ...
##  $ CTRL_GT_Mean_all    : chr  "0.006053539" "-0.039136252" "-0.04627155" "-0.011279953" ...
##  $ LPI_GT_Mean_all     : chr  "-0.191825576" "-0.278454226" "0.166542866" "0.490160515" ...
##  $ Mean_LPI_AA150_Lag  : num  -0.0724 -0.2242 -0.1437 -0.1664 -0.1687 ...
##  $ SD_LPI_AA150_Lag    : num  0.171 0.0877 0.147 0.1777 0.1468 ...
##  $ N_LPI_AA150_Lag     : int  3 3 3 3 3 3 3 0 0 3 ...
##  $ Mean_LPI_AA150_GT   : num  -0.188 -0.679 -0.501 0.088 -0.441 ...
##  $ SD_LPI_AA150_GT     : num  0.251 0.331 0.187 0.197 0.148 ...
##  $ N_LPI_AA150_GT      : int  3 3 3 3 3 3 3 0 0 3 ...
##  $ Mean_LPI_AA150_Yield: num  0.2935 1.0539 0.9286 0.0906 1.1463 ...
##  $ SD_LPI_AA150_Yield  : num  0.619 0.435 0.133 0.835 0.124 ...
##  $ N_LPI_AA150_Yield   : int  3 3 3 3 3 3 3 0 0 3 ...
##  $ Mean_LPI_AA125_Lag  : num  -0.0151 -0.1182 0.022 0.0555 -0.0532 ...
##  $ SD_LPI_AA125_Lag    : num  0.116 0.116 0.161 0.241 0.221 ...
##  $ N_LPI_AA125_Lag     : int  3 3 3 3 3 3 3 3 3 3 ...
##  $ Mean_LPI_AA125_GT   : num  0.1062 -0.0398 0.0342 0.205 0.024 ...
##  $ SD_LPI_AA125_GT     : num  0.225 0.1 0.195 0.142 0.153 ...
##  $ N_LPI_AA125_GT      : int  3 3 3 3 3 3 3 3 3 3 ...
##  $ Mean_LPI_AA125_Yield: num  -0.2491 0.0248 -0.0137 -0.0171 0.0335 ...
##  $ SD_LPI_AA125_Yield  : num  0.587 0.24 0.473 0.281 0.407 ...
##  $ N_LPI_AA125_Yield   : int  3 3 3 3 3 3 3 3 3 3 ...

STATISTICAL ANALYSIS

ESTIMATION OF P-VALUE AND P-ADJUSTED VALUES

In order to perform statistical analysis to identify strains that showed significant tolerance or sensitivity under acetic acid stress in liquid growth experiment, we will use the Val_data_curated dataset, see at STREAMLINING THE VALIDATION DATASET. This dataset already removed the rows with data of control strains and also all the blank rows.

  • Next, extract all the control strain data from the validation compiled result dataset i.e Val_data
Val_whole_data_dCTRL <- Val_data[dCTRL_rows, ]

For statistical test, similar to scan-o-matic statistical method 4, we hypothesized that the difference between the mean(µ) phenotypic performance of a specific CRISPRi strain (StrainX) considering all independent experimental replicates (n=3) to the mean phenotypic performance of all the CRISPRi control strains (with gRNA targeting no genetic locus in S. cerevisiae) would be zero, and any difference within the CRISPRi control strains phenotypic performance range (LPI GT range) to be just by chance.

Null Hypothesis : µStrainX(All_replicates_LPI_lag/GT/Yield)- µCRISPRi_Control_Strains(LPI_lag/GT/Yield) = 0

First all replicates of the LPI values of all control strains for lag, GT and Yield, respectively are extracted and saved into new vectors.

Val_dCTRL_lag_125 <- c(as.numeric(Val_whole_data_dCTRL[, 122]), as.numeric(Val_whole_data_dCTRL[, 125]), as.numeric(Val_whole_data_dCTRL[, 128]))
Val_dCTRL_GT_125 <- c(as.numeric(Val_whole_data_dCTRL[, 123]), as.numeric(Val_whole_data_dCTRL[, 126]), as.numeric(Val_whole_data_dCTRL[, 129]))
Val_dCTRL_Yield_125 <- c(as.numeric(Val_whole_data_dCTRL[, 124]), as.numeric(Val_whole_data_dCTRL[, 127]), as.numeric(Val_whole_data_dCTRL[, 130]))
  • T.test
for(i in 1:nrow(Val_data_curated)){
  test_lag <- t(Val_data_curated[i, c(122, 125, 128)])
  test_GT <- t(Val_data_curated[i, c(123, 126, 129)])
  test_Yield <- t(Val_data_curated[i, c(124, 127, 130)])
  x1 <- sum(!is.na(test_lag[, 1]))
  x2 <- sum(!is.na(test_GT[, 1]))
  x3 <- sum(!is.na(test_Yield[, 1]))
  if(x1>1){
    P.value_lag_125<- t.test(Val_dCTRL_lag_125, test_lag[which(!is.na(test_lag[, 1]))])
    Val_data_curated[i, 149] <- P.value_lag_125$p.value
  } else {
    Val_data_curated[i, 149] <- NA
  }
  if(x2>1){
    P.value_GT_125<- t.test(Val_dCTRL_GT_125, test_GT[which(!is.na(test_GT[, 1]))])
    Val_data_curated[i, 150] <- P.value_GT_125$p.value
  } else {
    Val_data_curated[i, 150] <- NA
  }
  if(x3>1){
    P.value_Yield_125<- t.test(Val_dCTRL_Yield_125, test_Yield[which(!is.na(test_Yield[, 1]))])
    Val_data_curated[i, 151] <- P.value_Yield_125$p.value
  } else {
    Val_data_curated[i, 151] <- NA
  }
}
colnames(Val_data_curated)[149:151] <- c("P.value_lag_125", "P.value_GT_125", "P.value_Yield_125")
  • Next, P.value adjustment by FDR
Val_data_curated[which(!is.na(Val_data_curated$P.value_lag_125)), 152] <- p.adjust(Val_data_curated$P.value_lag_125[which(!is.na(Val_data_curated$P.value_lag_125))], 
                                                                                                 method = "BH", 
                                                                                                 n = length(Val_data_curated$P.value_lag_125[which(!is.na(Val_data_curated$P.value_lag_125))]))
Val_data_curated[which(!is.na(Val_data_curated$P.value_GT_125)), 153] <- p.adjust(Val_data_curated$P.value_GT_125[which(!is.na(Val_data_curated$P.value_GT_125))], 
                                                                                                method = "BH", 
                                                                                                n = length(Val_data_curated$P.value_GT_125[which(!is.na(Val_data_curated$P.value_GT_125))]))
Val_data_curated[which(!is.na(Val_data_curated$P.value_Yield_125)), 154] <- p.adjust(Val_data_curated$P.value_Yield_125[which(!is.na(Val_data_curated$P.value_Yield_125))], 
                                                                                                   method = "BH", 
                                                                                                   n = length(Val_data_curated$P.value_Yield_125[which(!is.na(Val_data_curated$P.value_Yield_125))]))
colnames(Val_data_curated)[152:154] <- c("P.adj_lag_125", "P.adj_GT_125", "P.adj_Yield_125")
rownames(Val_data_curated) <- Val_data_curated$gRNA_name
str(Val_data_curated)
## 'data.frame':    176 obs. of  154 variables:
##  $ Container.Name      : chr  "Well 1" "Well 2" "Well 3" "Well 4" ...
##  $ gRNA_name           : chr  "RSM28-TRg-6" "RGL1-NRg-7" "COG1-NRg-3" "COG1-TRg-1" ...
##  $ CTRL_GT_Mean_all    : chr  "0.006053539" "-0.039136252" "-0.04627155" "-0.011279953" ...
##  $ LPI_GT_Mean_all     : chr  "-0.191825576" "-0.278454226" "0.166542866" "0.490160515" ...
##  $ Ctrl_Lag_R1         : num  2.27 2.09 2.2 2.24 2.13 ...
##  $ Ctrl_GT_R1          : num  2.65 2.57 2.61 2.7 2.59 ...
##  $ Ctrl_Yield_R1       : num  1.84 2.04 1.97 2 1.97 ...
##  $ AA150_Lag_R1        : num  54.5 40.8 47 47.1 46.7 ...
##  $ AA150_GT_R1         : num  20 18.3 17.8 27.2 14.6 ...
##  $ AA150_Yield_R1      : num  0.225 0.25 0.294 0.208 0.475 ...
##  $ Ctrl_Lag_R2         : num  2.09 1.96 2.03 2.1 1.97 ...
##  $ Ctrl_GT_R2          : num  2.94 2.53 2.58 2.68 2.57 ...
##  $ Ctrl_Yield_R2       : num  1.86 2.24 2.1 1.96 2.06 ...
##  $ AA150_Lag_R2        : num  38.9 34.7 36.5 35.7 34.2 ...
##  $ AA150_GT_R2         : num  18.7 10.4 15.7 23.6 18.8 ...
##  $ AA150_Yield_R2      : num  0.246 0.654 0.408 0.26 0.4 ...
##  $ Ctrl_Lag_R3         : num  2.33 2.36 2.25 2.35 2.19 ...
##  $ Ctrl_GT_R3          : num  2.8 2.74 2.66 2.75 2.5 ...
##  $ Ctrl_Yield_R3       : num  2.04 2.06 2.19 2.24 2.61 ...
##  $ AA150_Lag_R3        : num  60 49.9 54.3 57 49.7 ...
##  $ AA150_GT_R3         : num  29.2 10.8 12.6 35.7 14.3 ...
##  $ AA150_Yield_R3      : num  0.0891 0.401 0.3597 0.0628 0.5105 ...
##  $ Ctrl_Lag_R4         : num  2.28 2.25 2.19 2.25 2.12 ...
##  $ Ctrl_GT_R4          : num  2.92 2.75 2.75 2.89 2.6 ...
##  $ Ctrl_Yield_R4       : num  2.21 2.03 1.94 1.76 2.38 ...
##  $ AA125_Lag_R4        : num  20.2 18.7 21.6 24.8 20.2 ...
##  $ AA125_GT_R4         : num  8.83 6.23 7.4 8.65 6.63 ...
##  $ AA125_Yield_R4      : num  0.57 1.016 0.729 0.869 1.028 ...
##  $ Ctrl_Lag_R5         : num  2.46 2.39 2.52 2.6 2.68 ...
##  $ Ctrl_GT_R5          : num  2.66 2.56 2.75 2.8 2.72 ...
##  $ Ctrl_Yield_R5       : num  2.2 2.42 1.86 2.02 1.9 ...
##  $ AA125_Lag_R5        : num  18.6 16.2 18.7 18.3 17.2 ...
##  $ AA125_GT_R5         : num  5.28 4.81 5.05 6.33 5.16 ...
##  $ AA125_Yield_R5      : num  1.74 1.9 1.75 1.73 1.85 ...
##  $ Ctrl_Lag_R6         : num  2.18 2.21 2.15 2.19 2.2 ...
##  $ Ctrl_GT_R6          : num  2.78 2.77 2.73 2.79 2.65 ...
##  $ Ctrl_Yield_R6       : num  2.27 2.07 2.07 2.23 2.26 ...
##  $ AA125_Lag_R6        : num  20.7 18.2 20.9 22.3 20.3 ...
##  $ AA125_GT_R6         : num  5.92 5.36 5.7 7.13 5.48 ...
##  $ AA125_Yield_R6      : num  1.4 1.51 1.5 1.33 1.58 ...
##  $ log_Ctrl_Lag_R1     : num  0.822 0.739 0.787 0.808 0.757 ...
##  $ log_Ctrl_GT_R1      : num  0.973 0.944 0.958 0.994 0.952 ...
##  $ log_Ctrl_Yield_R1   : num  0.609 0.712 0.676 0.694 0.677 ...
##  $ log_AA150_Lag_R1    : num  4 3.71 3.85 3.85 3.84 ...
##  $ log_AA150_GT_R1     : num  3 2.91 2.88 3.3 2.68 ...
##  $ log_AA150_Yield_R1  : num  -1.49 -1.388 -1.224 -1.57 -0.744 ...
##  $ log_Ctrl_Lag_R2     : num  0.739 0.674 0.706 0.74 0.68 ...
##  $ log_Ctrl_GT_R2      : num  1.079 0.927 0.948 0.986 0.945 ...
##  $ log_Ctrl_Yield_R2   : num  0.618 0.807 0.74 0.674 0.723 ...
##  $ log_AA150_Lag_R2    : num  3.66 3.55 3.6 3.57 3.53 ...
##  $ log_AA150_GT_R2     : num  2.93 2.34 2.75 3.16 2.94 ...
##  $ log_AA150_Yield_R2  : num  -1.402 -0.424 -0.897 -1.348 -0.916 ...
##  $ log_Ctrl_Lag_R3     : num  0.845 0.861 0.811 0.855 0.784 ...
##  $ log_Ctrl_GT_R3      : num  1.031 1.009 0.978 1.012 0.918 ...
##  $ log_Ctrl_Yield_R3   : num  0.712 0.724 0.784 0.804 0.959 ...
##  $ log_AA150_Lag_R3    : num  4.09 3.91 3.99 4.04 3.91 ...
##  $ log_AA150_GT_R3     : num  3.38 2.38 2.53 3.58 2.66 ...
##  $ log_AA150_Yield_R3  : num  -2.418 -0.914 -1.022 -2.768 -0.672 ...
##  $ log_Ctrl_Lag_R4     : num  0.826 0.809 0.786 0.813 0.751 ...
##  $ log_Ctrl_GT_R4      : num  1.07 1.011 1.01 1.061 0.954 ...
##  $ log_Ctrl_Yield_R4   : num  0.793 0.708 0.665 0.566 0.867 ...
##  $ log_AA125_Lag_R4    : num  3 2.93 3.07 3.21 3.01 ...
##  $ log_AA125_GT_R4     : num  2.18 1.83 2 2.16 1.89 ...
##  $ log_AA125_Yield_R4  : num  -0.5618 0.0163 -0.3155 -0.1408 0.0275 ...
##  $ log_Ctrl_Lag_R5     : num  0.899 0.873 0.925 0.957 0.985 ...
##  $ log_Ctrl_GT_R5      : num  0.977 0.938 1.011 1.031 1 ...
##  $ log_Ctrl_Yield_R5   : num  0.79 0.885 0.622 0.702 0.644 ...
##  $ log_AA125_Lag_R5    : num  2.92 2.79 2.93 2.9 2.84 ...
##  $ log_AA125_GT_R5     : num  1.66 1.57 1.62 1.84 1.64 ...
##  $ log_AA125_Yield_R5  : num  0.552 0.642 0.561 0.548 0.614 ...
##  $ log_Ctrl_Lag_R6     : num  0.78 0.793 0.764 0.785 0.787 ...
##  $ log_Ctrl_GT_R6      : num  1.021 1.017 1.003 1.026 0.973 ...
##  $ log_Ctrl_Yield_R6   : num  0.822 0.729 0.73 0.802 0.815 ...
##  $ log_AA125_Lag_R6    : num  3.03 2.9 3.04 3.1 3.01 ...
##  $ log_AA125_GT_R6     : num  1.78 1.68 1.74 1.96 1.7 ...
##  $ log_AA125_Yield_R6  : num  0.34 0.412 0.403 0.284 0.459 ...
##  $ LSC_Ctrl_Lag_R1     : num  -0.0207 -0.1037 -0.0553 -0.0345 -0.0849 ...
##  $ LSC_Ctrl_GT_R1      : num  -0.00259 -0.03199 -0.01737 0.01855 -0.02336 ...
##  $ LSC_Ctrl_Yield_R1   : num  -0.16 -0.0577 -0.0929 -0.0757 -0.0925 ...
##  $ LSC_AA150_Lag_R1    : num  -0.0332 -0.3228 -0.1814 -0.1797 -0.1873 ...
##  $ LSC_AA150_GT_R1     : num  -0.239 -0.329 -0.358 0.0647 -0.557 ...
##  $ LSC_AA150_Yield_R1  : num  0.451 0.553 0.717 0.371 1.197 ...
##  $ LSC_Ctrl_Lag_R2     : num  -0.104 -0.169 -0.136 -0.103 -0.162 ...
##  $ LSC_Ctrl_GT_R2      : num  0.103 -0.049 -0.0273 0.0108 -0.0307 ...
##  $ LSC_Ctrl_Yield_R2   : num  -0.1511 0.0373 -0.029 -0.0954 -0.0465 ...
##  $ LSC_AA150_Lag_R2    : num  -0.369 -0.483 -0.435 -0.456 -0.499 ...
##  $ LSC_AA150_GT_R2     : num  -0.3087 -0.8989 -0.4824 -0.0736 -0.3005 ...
##  $ LSC_AA150_Yield_R2  : num  0.539 1.517 1.044 0.593 1.024 ...
##  $ LSC_Ctrl_Lag_R3     : num  0.00263 0.01828 -0.03106 0.01217 -0.05865 ...
##  $ LSC_Ctrl_GT_R3      : num  0.0557 0.0339 0.0029 0.0364 -0.0577 ...
##  $ LSC_Ctrl_Yield_R3   : num  -0.0573 -0.045 0.0147 0.0352 0.1893 ...
##  $ LSC_AA150_Lag_R3    : num  0.0633 -0.1209 -0.0374 0.012 -0.1254 ...
##  $ LSC_AA150_GT_R3     : num  0.139 -0.855 -0.703 0.339 -0.577 ...
##  $ LSC_AA150_Yield_R3  : num  -0.478 1.027 0.918 -0.828 1.268 ...
##  $ LSC_Ctrl_Lag_R4     : num  -0.0167 -0.033 -0.0569 -0.0293 -0.0917 ...
##  $ LSC_Ctrl_GT_R4      : num  0.0949 0.035 0.0344 0.0852 -0.0211 ...
##  $ LSC_Ctrl_Yield_R4   : num  0.0236 -0.0611 -0.1048 -0.2036 0.0974 ...
##  $ LSC_AA125_Lag_R4    : num  -0.00492 -0.08097 0.06473 0.20324 -0.00257 ...
##  $ LSC_AA125_GT_R4     : num  0.458 0.11 0.281 0.437 0.172 ...
##   [list output truncated]

CORRELATION ANALYSIS BETWEEN SCAN_O_MATIC LPI VS BIOSCREEN LPI

For correlation analysis, we need to extract the scan-o-matic data for the strains selected for validation experiment

  • extracting the data from the Scan-o-matic analysis dataset
validation_strains_data <- Analysis_Final_3[validation_strains, ]
  • Extracting the dCTRL strain data from the original scan-o-matic data
dCTRL_data_scan_o_matic <- Analysis_Final_3[(Analysis_Final_3$gRNA_name %in% dCtrl_strains), ]
  • Data preparation
validation_strains_data <- rbind(validation_strains_data, dCTRL_data_scan_o_matic)

#setting the row order similar to that of the bioscreen dataset Validation_LPI_all
validation_strains_data <- validation_strains_data[rownames(Validation_LPI_all), ]

Now making a new data frame by extracting only the required column from both data.frame i.e. the data.frame that has the scan-o-matic analysis for the validation strains (validation_strains_data) and the data.frame that has the data of the bioscreen liquid growth experiment (Validation_LPI_all)

Validation_new_df <- validation_strains_data[, c(1:8, 96:97, 87, 35:40, 98:99, 89, 100, 102, 104:107, 58, 60, 74:79, 84, 86)]
Validation_new_df <- cbind(Validation_new_df, Validation_LPI_all[, 4:21])
  • Categorizing strains based on their performance in scan-o-matic

We will add a new column with an identifier that indicates the 50 most acetic acid tolerant and acetic acid sensitive strains in this list. We already had a dataframe with the list of this strains i.e. bot_top_50

bot50 <- bot_top_50[1:48, ]
Top50 <- bot_top_50[49:98, ]

We add a separate column to categorize the strains used in the validation experiment

#dCTRL strains (1)
Validation_new_df[(Validation_new_df$gRNA_name %in% dCtrl_strains), 55] <- 1
#top 50 acetic acid tolerant strains (2),
Validation_new_df[(Validation_new_df$gRNA_name %in% Top50$gRNA_name), 55] <- 2
#most 50 acetic acid sensitive strains (3),
Validation_new_df[(Validation_new_df$gRNA_name %in% bot50$gRNA_name), 55] <- 3
#Other candidates (4)
Validation_new_df[which(is.na(Validation_new_df$V55)), 55] <- 4
#Changing the column name
colnames(Validation_new_df)[55] <- "Strain_category"
str(Validation_new_df)
## 'data.frame':    183 obs. of  55 variables:
##  $ gRNA_name                 : chr  "RSM28-TRg-6" "RGL1-NRg-7" "COG1-NRg-3" "COG1-TRg-1" ...
##  $ Seq                       : chr  "GGAATTAAACTTAACGAAAC" "GCTCTTGTTTAGTAGGCGTG" "GACATAAGCATTCGTATAAT" "CATTCGTACAACAAATCTTG" ...
##  $ SOURCEPLATEID             : chr  "R2877.H.002" "R2877.H.002" "R2877.H.001" "R2877.H.001" ...
##  $ SOURCECOLONYCOLUMN        : int  10 3 20 17 17 9 8 8 5 12 ...
##  $ SOURCECOLONYROW           : chr  "E" "P" "L" "I" ...
##  $ GENE                      : chr  "RSM28" "RGL1" "COG1" "COG1" ...
##  $ Control.gRNA              : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Location_1536             : chr  "I19" "AE5" "W39" "Q33" ...
##  $ CTRL_GT_Mean_all          : num  0.00605 -0.03914 -0.04627 -0.01128 -0.03901 ...
##  $ n_CTRL                    : int  6 6 6 6 6 6 6 6 6 6 ...
##  $ CTRL_GT_MEAN_RND1_2_SD    : num  0.01124 0.00885 0.066 0.00894 0.0067 ...
##  $ LPI_GT_RND1_R1            : num  -0.2983 -0.383 0.2091 0.5628 0.0449 ...
##  $ LPI_GT_RND1_R2            : num  -0.35 -0.414 0.137 0.449 0.148 ...
##  $ LPI_GT_RND1_R3            : num  -0.254 -0.329 0.141 0.55 0.142 ...
##  $ LPI_GT_RND2_R1            : num  -0.0849 -0.156 0.2351 0.462 0.0213 ...
##  $ LPI_GT_RND2_R2            : num  -0.0374 -0.1805 0.1606 0.3855 -0.0465 ...
##  $ LPI_GT_RND2_R3            : num  -0.126 -0.208 0.116 0.532 0.089 ...
##  $ LPI_GT_Mean_all           : num  -0.1918 -0.2785 0.1665 0.4902 0.0665 ...
##  $ n_LPI                     : int  6 6 6 6 6 6 6 3 6 6 ...
##  $ LPI_GT_MEAN_RND1_2_SD     : num  0.15436 0.13712 0.00591 0.043 0.06396 ...
##  $ P.value_M3                : num  8.93e-03 1.16e-03 5.99e-04 1.47e-05 2.09e-01 ...
##  $ P.adjusted_M3             : num  0.08284 0.03436 0.02674 0.00739 0.39482 ...
##  $ Midpoint_TSS_dist         : int  -70 -134 -173 -34 -132 -163 -160 -181 -127 -246 ...
##  $ Norm_atac_seq_read_density: num  0.66 0.19 0.56 0.79 0.31 0.46 0.17 0.87 0.28 0.45 ...
##  $ Multiple_ORFs_Targeted    : int  0 1 0 0 0 0 0 0 0 0 ...
##  $ nearby_genes              : chr  NA "YPL067C|-25" NA NA ...
##  $ CTRL_Y_RND1_2_MEAN        : num  -0.0142 -0.1249 -0.0559 -0.3699 -0.1031 ...
##  $ CTRL_Y_RND1_2_SD          : num  0.0176 0.1077 0.0252 0.0714 0.0506 ...
##  $ LPI_Y_RND1_R1             : num  0.479 -0.132 -0.173 -0.465 0.215 ...
##  $ LPI_Y_RND1_R2             : num  0.5793 -0.0818 -0.112 -0.5196 0.151 ...
##  $ LPI_Y_RND1_R3             : num  0.49 -0.135 -0.28 -0.492 0.168 ...
##  $ LPI_Y_RND2_R1             : num  -0.00748 0.3632 -0.19109 -0.43542 -0.12262 ...
##  $ LPI_Y_RND2_R2             : num  0.0807 0.4621 -0.118 -0.5155 0.0336 ...
##  $ LPI_Y_RND2_R3             : num  0.0817 0.6341 -0.0195 -0.3559 0.0466 ...
##  $ LPI_Y_RND1_2_MEAN         : num  0.2839 0.1851 -0.149 -0.4639 0.0818 ...
##  $ LPI_Y_RND1_2_SD           : num  0.2588 0.3419 0.0879 0.0617 0.1226 ...
##  $ Mean_LPI_AA150_Lag        : num  -0.0724 -0.2242 -0.1437 -0.1664 -0.1687 ...
##  $ SD_LPI_AA150_Lag          : num  0.171 0.0877 0.147 0.1777 0.1468 ...
##  $ N_LPI_AA150_Lag           : int  3 3 3 3 3 3 3 0 0 3 ...
##  $ Mean_LPI_AA150_GT         : num  -0.188 -0.679 -0.501 0.088 -0.441 ...
##  $ SD_LPI_AA150_GT           : num  0.251 0.331 0.187 0.197 0.148 ...
##  $ N_LPI_AA150_GT            : int  3 3 3 3 3 3 3 0 0 3 ...
##  $ Mean_LPI_AA150_Yield      : num  0.2935 1.0539 0.9286 0.0906 1.1463 ...
##  $ SD_LPI_AA150_Yield        : num  0.619 0.435 0.133 0.835 0.124 ...
##  $ N_LPI_AA150_Yield         : int  3 3 3 3 3 3 3 0 0 3 ...
##  $ Mean_LPI_AA125_Lag        : num  -0.0151 -0.1182 0.022 0.0555 -0.0532 ...
##  $ SD_LPI_AA125_Lag          : num  0.116 0.116 0.161 0.241 0.221 ...
##  $ N_LPI_AA125_Lag           : int  3 3 3 3 3 3 3 3 3 3 ...
##  $ Mean_LPI_AA125_GT         : num  0.1062 -0.0398 0.0342 0.205 0.024 ...
##  $ SD_LPI_AA125_GT           : num  0.225 0.1 0.195 0.142 0.153 ...
##  $ N_LPI_AA125_GT            : int  3 3 3 3 3 3 3 3 3 3 ...
##  $ Mean_LPI_AA125_Yield      : num  -0.2491 0.0248 -0.0137 -0.0171 0.0335 ...
##  $ SD_LPI_AA125_Yield        : num  0.587 0.24 0.473 0.281 0.407 ...
##  $ N_LPI_AA125_Yield         : int  3 3 3 3 3 3 3 3 3 3 ...
##  $ Strain_category           : num  2 2 4 4 4 4 4 4 4 4 ...
  • Scatter plot and linear regression analysis
Figure 20 (fig. 3 in manuscript): Scatterplot of the relative performance of the strains in liquid medium with 125mM of acetic acid and in solid medium with 150 mM acetic acid (scan-o-matic screening). The linear regression of the data is displayed with a black line. The mean of the three LPI GT replicates of each strain is plotted, control strains in green, acetic acid sensitive strains in red, acetic acid tolerant strains in blue and remaining strains in black. The names of the genes repressed in the tolerant or sensitive strains are indicated in the plot

Figure 20 (fig. 3 in manuscript): Scatterplot of the relative performance of the strains in liquid medium with 125mM of acetic acid and in solid medium with 150 mM acetic acid (scan-o-matic screening). The linear regression of the data is displayed with a black line. The mean of the three LPI GT replicates of each strain is plotted, control strains in green, acetic acid sensitive strains in red, acetic acid tolerant strains in blue and remaining strains in black. The names of the genes repressed in the tolerant or sensitive strains are indicated in the plot

## 
## Call:
## lm(formula = Validation_new_df$Mean_LPI_AA125_GT ~ Validation_new_df$LPI_GT_Mean_all)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.35706 -0.10087 -0.00064  0.10143  0.71447 
## 
## Coefficients:
##                                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                        0.08035    0.02173   3.697    3e-04 ***
## Validation_new_df$LPI_GT_Mean_all  0.82652    0.03971  20.814   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2574 on 159 degrees of freedom
##   (22 observations deleted due to missingness)
## Multiple R-squared:  0.7315, Adjusted R-squared:  0.7298 
## F-statistic: 433.2 on 1 and 159 DF,  p-value: < 2.2e-16

CORRELATION BETWEEN PHENOTYPES IN BIOSCREEN EXPERIMENT

print("At 125mM Acetic Acid")
## [1] "At 125mM Acetic Acid"
print("lag vs GT")
## [1] "lag vs GT"
cor(Validation_new_df$Mean_LPI_AA125_Lag, 
    Validation_new_df$Mean_LPI_AA125_GT,  
    method = "pearson", 
    use = "complete.obs")
## [1] 0.5682123
print("lag vs Yield")
## [1] "lag vs Yield"
cor(Validation_new_df$Mean_LPI_AA125_Lag, 
    Validation_new_df$Mean_LPI_AA125_Yield,  
    method = "pearson", 
    use = "complete.obs")
## [1] -0.6062123
print("GT vs Yield")
## [1] "GT vs Yield"
cor(Validation_new_df$Mean_LPI_AA125_GT, 
    Validation_new_df$Mean_LPI_AA125_Yield,  
    method = "pearson", 
    use = "complete.obs")
## [1] -0.9102591
print("At 150mM Acetic Acid")  
## [1] "At 150mM Acetic Acid"
print("lag vs GT")
## [1] "lag vs GT"
cor(Validation_new_df$Mean_LPI_AA150_Lag, 
    Validation_new_df$Mean_LPI_AA150_GT,  
    method = "pearson", 
    use = "complete.obs")
## [1] 0.1103463
print("lag vs Yield")
## [1] "lag vs Yield"
cor(Validation_new_df$Mean_LPI_AA150_Lag, 
    Validation_new_df$Mean_LPI_AA150_Yield,  
    method = "pearson", 
    use = "complete.obs")
## [1] -0.4132699
print("GT vs Yield")
## [1] "GT vs Yield"
cor(Validation_new_df$Mean_LPI_AA150_GT, 
    Validation_new_df$Mean_LPI_AA150_Yield,  
    method = "pearson", 
    use = "complete.obs")
## [1] -0.8419203

PLOTTING HEATMAP

Visualizing the mean of the replicates of the selected strains

#Make a color palette
colfunc5<-colorRampPalette(c("goldenrod4", "goldenrod", "white", "turquoise", "turquoise4"))
plot(rep(1,100), col=colfunc5(100), pch=19,cex=2)

The function colfunc5 when called as colfunc5(100) will create a color pallate of hundred colors where white is the mid point. Therefore, the break argument reqires a numerical vector in increasing order of length 100+1. The range of the break vector should be in such a way that all strains with LPI_ value less than -0.02 gets a shade of goldenrod i.e. the deeper the shade of goldenrod, the more acetic acid tolerant it is. Moreover, color range should be equally distributed. Hence, first we created a vector of 49 eliments with equally distributed numbers between -0.5 to -0.05

brk1 <- c(seq(-0.5, -0.05, length.out = 48))

Then between -0.04 to 0.04

brk2 <- c(seq(-0.04, 0.04, length.out = 5))

Finally all strains with LPI value greater than 0.02 will have a shade of turquoise. For that we create another numerical vector of 49 numbers equally distributed starting from 0.07 to 2. The deeper the shade more AA sensitive the strain is. This sensitive range is much larger than the fitness window. That is why the distribution space is also larger. Therefore, between 0.05 to 2

brk3 <- c(seq(0.05, 2, length.out = 48))

Combining we have a numerical vector of length 101 to be used for the break argument

brk_F <- c(brk1, brk2, brk3)

Arranging the rows in decreasing order in terms of their mean phenotypic response (LPI) in generation time under acetc acid condition in the scan-o-matic experiment

Validation_new_df <- Validation_new_df[order(Validation_new_df$LPI_GT_Mean_all, decreasing = TRUE), ]

Therefore, the list generated should have the most AA sensitive strains at the beginning. However, the most AA sensitive strains did not grow in AA condition. Due to missing values(NA) they are positioned in the last 14 rows. We switch them to the front of the list

Validation_new_df <- Validation_new_df[c(170:183, 1:169), ]

Note: The LPI_yield values is having an inverse profile with generation time (GT). LPI_Yield is positive for acetic acid tolerant strain i.e. yield is higher than the control strains whereas GT is negative as GT is lower than the control strain. Therefore, we multiplied -1 with the mean LPI_Yield values of the bioscreen output to avoid confusion in color profile and made two separate columns in the data.frame

Validation_new_df[, 56] <- Validation_new_df[, 43]*(-1)
Validation_new_df[, 57] <- Validation_new_df[, 52]*(-1)
colnames(Validation_new_df)[56:57] <- c("Mean_LPI_AA150_Yield(-1)", "Mean_LPI_AA125_Yield(-1)")

Now plotting the heat map including the following columns. The index of the columns as in Validation_new_df is given in bracket

  • From scan-o-matic data
    • CTRL_GT_Mean_all [9]
    • LPI_GT_Mean_all [18]
  • From validation experiment bioscreen 125mM
    • Mean_LPI_AA125_Lag [46]
    • Mean_LPI_AA125_GT [49]
    • (-1) x Mean_LPI_AA125_Yield [57]
  • From validation experiment bioscreen 150mM
    • Mean_LPI_AA150_Lag [37]
    • Mean_LPI_AA150_GT [40]
    • (-1) x Mean_LPI_AA150_Yield [56]
Figure 21 (fig. S2 in manuscript): Heatmap displaying the relative performance of 183 strains grown in liquid media. Column A and B show the mean LSC or LPI (n=6) of these strains, based on the solid media Scan-o-matic experiments, and columns C-H the mean LPI (n=3) of the strains based on growth in liquid media

Figure 21 (fig. S2 in manuscript): Heatmap displaying the relative performance of 183 strains grown in liquid media. Column A and B show the mean LSC or LPI (n=6) of these strains, based on the solid media Scan-o-matic experiments, and columns C-H the mean LPI (n=3) of the strains based on growth in liquid media

PCA ANALYSIS

  • Addition of the functional/component groups The functional groups were manually added to the Validation_new_df dataset. The curated dataset is available within the COMPILED_DATA folder

Validation data with functional groups : Validation_new_df_with_Groups_by_GO.csv

  • Import data
Validation_new_df_Grp <- read.csv("COMPILED_DATA/Validation_new_df_with_Groups_by_GO.csv", stringsAsFactors = FALSE, na.strings = )
rownames(Validation_new_df_Grp) <- Validation_new_df_Grp$gRNA_name
  • PCA analysis

INSTALL : factoextra

library(factoextra)
## Welcome! Want to learn more? See two factoextra-related books at https://goo.gl/ve3WBa
#Plotting PCA for only Proteasomal genes and control strains
Functional_group2 <- c("GO:0005839", "GO:0008540", "GO:0008541", "dCTRL")
dataset_pca_125mM_Proteasome <- na.omit(Validation_new_df_Grp[which(Validation_new_df_Grp$Group_BY_GO_Terms %in% Functional_group2), c(46, 49, 52, 55, 6, 58)])
res.pca_Proteasome <- prcomp(dataset_pca_125mM_Proteasome[, 1:3], scale = TRUE)
  • Plotting the PCA
## [1] "GO:0005839 = proteasome core complex"
## [1] "GO:0008540 = proteasome regulatory particle, base subcomplex"
## [1] "GO:0008541 = proteasome regulatory particle, lid subcomplex"
## [1] "dCTRL = Control strains"
Figure 22: PCA plot for only Proteasomal genes and control strains

Figure 22: PCA plot for only Proteasomal genes and control strains

BAR PLOT (LPI GT) OF STRAINS TARGETING PROTEASOMAL GENES

  • Making a vector with all Proteasome genes tested in validation experiment
Gene_set1_Proteasome <- c("RPN8", "RPN9", "RPN12", "RPT1", "RPT2", "RPT4", "PRE4", "PUP3")
  • making a vector with names that are present in the GENE field of all control strains
dCTRL_GENES <- c("Ctrl_14", "Ctrl_2",  "Ctrl_23", "Ctrl_28", "Ctrl_30", "Ctrl_32", "Ctrl_34")
  • Data preparation for the bar plot
barplot_dataset7 <- Validation_new_df_Grp[which(Validation_new_df_Grp$GENE %in% c(Gene_set1_Proteasome, dCTRL_GENES)), c(1, 49, 50)]
colnames(barplot_dataset7)[2:3] <- c("LPI_GT", "SD_GT")
library(reshape)
reshape_barplot_dataset7 <- reshape(data=barplot_dataset7, idvar="gRNA_name",
                                    varying = list(colnames(barplot_dataset7)[2], colnames(barplot_dataset7)[3]),
                                    v.name=c("Mean", "SD"),
                                    times = c("LPI_GT"),
                                    new.row.names = 1:10000,
                                    direction="long")
  • Plotting LPI GT with error bars of strains targeting proteasomal genes and of control strains.
Figure 23: (Fig7A in manuscript) Barplot of relative generation time in liquid medium of CRISPRi strains with gRNAs targeting genes encoding proteasomal subunits (20S CP; core particle, 19S lid or 19S base) and the control strains

Figure 23: (Fig7A in manuscript) Barplot of relative generation time in liquid medium of CRISPRi strains with gRNAs targeting genes encoding proteasomal subunits (20S CP; core particle, 19S lid or 19S base) and the control strains

  • Statistical data for the above strains
test <- Val_data_curated[as.character(Validation_new_df_Grp[which(Validation_new_df_Grp$GENE %in% Gene_set1_Proteasome), 1]), c(2, 150)]
print("P-value ≤ 0.5")
## [1] "P-value ≤ 0.5"
test[which(test$P.value_GT_125<=0.05), ]
##               gRNA_name P.value_GT_125
## PRE4-NRg-9   PRE4-NRg-9   1.707167e-05
## PRE4-NRg-4   PRE4-NRg-4   3.710185e-08
## PUP3-TRg-6   PUP3-TRg-6   5.746297e-03
## PUP3-TRg-5   PUP3-TRg-5   3.157717e-03
## PUP3-TRg-10 PUP3-TRg-10   1.365917e-02
## RPN12-TRg-6 RPN12-TRg-6   3.882935e-03
## RPN9-NRg-4   RPN9-NRg-4   5.275561e-03
## RPN9-NRg-3   RPN9-NRg-3   1.297081e-02
## RPN9-TRg-1   RPN9-TRg-1   4.770957e-02
## RPN9-NRg-7   RPN9-NRg-7   4.100466e-02
## RPT4-TRg-1   RPT4-TRg-1   3.933973e-02
## RPT4-NRg-2   RPT4-NRg-2   2.772957e-03

BAR PLOT (LSC GT) OF STRAINS TARGETING PROTEASOMAL GENES

  • Data preparation for the bar plot
Val_data_lsc_mean <- Val_data_curated[, 2:4]
for(i in 1:nrow(Val_data_curated)){
  Val_data_lsc_mean[i, 4] <- mean(na.omit(as.numeric(Val_data_curated[i, c(78, 84, 90, 96, 102, 108)])))
  Val_data_lsc_mean[i, 5] <- sd(na.omit(as.numeric(Val_data_curated[i, c(78, 84, 90, 96, 102, 108)])))
}
row.names(Val_data_lsc_mean) <- Val_data_lsc_mean$gRNA_name

barplot_dataset8 <- Val_data_lsc_mean[as.character(barplot_dataset7$gRNA_name), c(1, 4:5)]
barplot_dataset8 <- barplot_dataset8[-c(1:7), ]
colnames(barplot_dataset8)[2:3] <- c("LSC_GT", "SD_GT")
library(reshape)
reshape_barplot_dataset8 <- reshape(data=barplot_dataset8, idvar="gRNA_name",
                                    varying = list(colnames(barplot_dataset8)[2], colnames(barplot_dataset8)[3]),
                                    v.name=c("Mean", "SD"),
                                    times = c("LSC_GT"),
                                    new.row.names = 1:10000,
                                    direction="long")
  • Plotting LSC GT with error bars of strains targeting proteasomal genes and of control strains
Figure 24: Barplot of normalized generation time in liquid medium of CRISPRi strains with gRNAs targeting genes encoding proteasomal subunits (20S CP; core particle, 19S lid or 19S base)

Figure 24: Barplot of normalized generation time in liquid medium of CRISPRi strains with gRNAs targeting genes encoding proteasomal subunits (20S CP; core particle, 19S lid or 19S base)

BOX PLOT (LPI YIELD AND LPI LAG) OF STRAINS TARGETING PROTEASOMAL GENES

LPI YIELD
  • Extracting all strain with gRNA targeting proteasomal genes from Validation_new_df_Grp dataset that have induced significant acetic acid tolerance (P-Value ≤ 0.05)
LID_strains <- Validation_new_df_Grp$gRNA_name[which(Validation_new_df_Grp$Group_BY_GO_Terms %in% c("GO:0008541"))]
BASE_strains <- Validation_new_df_Grp$gRNA_name[which(Validation_new_df_Grp$Group_BY_GO_Terms %in% c("GO:0008540"))]
CP_strains <- Validation_new_df_Grp$gRNA_name[which(Validation_new_df_Grp$Group_BY_GO_Terms %in% c("GO:0005839"))]

LID_strains_sig <- LID_strains[c(1, 6:9)]
BASE_strains_sig <- c("RPT4-NRg-2")
CP_strains_sig <- CP_strains[c(1:2, 5:7)]
  • Plotting box plot for LPI Yield
Figure 25: (Fig7B in manuscript) Boxplot of relative growth yield in liquid medium with the data of all significantly acetic acid tolerant CRISPRi strains with gRNAs targeting genes encoding proteasomal subunits (20S CP; core particle, 19S lid or 19S base) and the control strains

Figure 25: (Fig7B in manuscript) Boxplot of relative growth yield in liquid medium with the data of all significantly acetic acid tolerant CRISPRi strains with gRNAs targeting genes encoding proteasomal subunits (20S CP; core particle, 19S lid or 19S base) and the control strains

  • Statistical significance of LPI Yield
P_val_dCTRL_LID_Yield <- t.test(as.numeric(as.matrix(Val_whole_data_dCTRL[, c(124, 127, 130)])), 
                          as.numeric(as.matrix(Val_data_curated[which(Val_data_curated$gRNA_name %in% LID_strains_sig), c(124, 127, 130)])))
P_val_dCTRL_LID_Yield$p.value
## [1] 0.0001320927
P_val_dCTRL_BASE_Yield <- t.test(as.numeric(as.matrix(Val_whole_data_dCTRL[, c(124, 127, 130)])), 
                           as.numeric(as.matrix(Val_data_curated[which(Val_data_curated$gRNA_name %in% BASE_strains_sig), c(124, 127, 130)])))
P_val_dCTRL_BASE_Yield$p.value
## [1] 0.08016981
P_val_dCTRL_CP_Yield <- t.test(as.numeric(as.matrix(Val_whole_data_dCTRL[, c(124, 127, 130)])), 
                         as.numeric(as.matrix(Val_data_curated[which(Val_data_curated$gRNA_name %in% CP_strains_sig), c(124, 127, 130)])))
P_val_dCTRL_CP_Yield$p.value
## [1] 0.001141934
LPI LAG PHASE
  • Plotting box plot for LPI Lag phase
Figure 26: (Fig7C in manuscript) Boxplot of relative lag phase in liquid medium with the data of all significantly acetic acid tolerant CRISPRi strains with gRNAs targeting genes encoding proteasomal subunits (20S CP; core particle, 19S lid or 19S base) and the control strains

Figure 26: (Fig7C in manuscript) Boxplot of relative lag phase in liquid medium with the data of all significantly acetic acid tolerant CRISPRi strains with gRNAs targeting genes encoding proteasomal subunits (20S CP; core particle, 19S lid or 19S base) and the control strains

  • Statistical significance of LPI Lag
P_val_dCTRL_LID_Lag <- t.test(as.matrix(Val_whole_data_dCTRL[, c(122, 125, 128)]), 
                              as.numeric(as.matrix(Val_data_curated[which(Val_data_curated$gRNA_name %in% LID_strains_sig), c(122, 125, 128)])))
P_val_dCTRL_LID_Lag$p.value
## [1] 0.6740559
P_val_dCTRL_BASE_Lag <- t.test(as.matrix(Val_whole_data_dCTRL[, c(122, 125, 128)]), 
                               as.numeric(as.matrix(Val_data_curated[which(Val_data_curated$gRNA_name %in% BASE_strains_sig), c(122, 125, 128)])))
P_val_dCTRL_BASE_Lag$p.value
## [1] 0.6862921
P_val_dCTRL_CP_Lag <- t.test(as.matrix(Val_whole_data_dCTRL[, c(122, 125, 128)]), 
                             as.numeric(as.matrix(Val_data_curated[which(Val_data_curated$gRNA_name %in% CP_strains_sig), c(122, 125, 128)])))
P_val_dCTRL_CP_Lag$p.value
## [1] 0.05837092